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Foreword 



The last ten years have seen a gradual fragmentation of the Automated Reason- 
ing community into various disparate groups, each with its own conference: the 
Conference on Automated Reasoning (CADE), the International Workshop on 
First-Order Theorem Proving (FTP), and the International Conference on Au- 
tomated Reasoning with Analytic Tableau and Related Methods (TABLEAUX) 
to name three. During 1999, various members of these three communities dis- 
cussed the idea of holding a joint conference in 2001 to bring our communities 
together again. The plan was to hold a one-off conference for 2001, to be repeated 
if it proved a success. This volume contains the papers presented at the resulting 
event: the first International Joint Conference on Automated Reasoning (IJCAR 
2001), held in Siena, Italy, from June 18-23, 2001. 

We received 88 research papers and 24 systems descriptions as submissions. 
Each submission was fully refereed by at least three peers who were asked to 
write a report on the quality of the submissions. These reports were accessible to 
members of the programme committee via a web-based system specially designed 
for electronic discussions. As a result we accepted 37 research papers and 19 
system descriptions, which make up these proceedings. In addition, this volume 
contains full papers or extended abstracts from the five invited speakers. 

Ten one-day workshops and four tutorials were held during IJCAR 2001. The 
automated theorem proving system competition (CASC) was organized by Geoff 
Sutcliffe to evaluate the performance of sound, fully automatic, classical, first- 
order automated theorem proving systems. The third Workshop on Inference in 
Computational Semantics (ICoS-3) and the 9th Symposium on the Integration 
of Symbolic Computation and Mechanized Reasoning (CALCULEMUS-2001) 
were co- located with IJCAR 2001, and held their own associated workshops and 
produced their own separate proceedings. 

We would like to acknowledge the enormous amount of work put in by the 
members of the program committee, the various steering committees, the IJCAR 
officials, and additional referees named on the following pages. In particular, 
we would like to thank Fabio Massacci and Marco Baioletti for organizing the 
conference itself, Gernot Salzer for installing and maintaining the software for 
our web-based reviewing procedure, and Gertrud Bauer for assembling these 
proceedings. Finally, we thank the sponsors named on the following pages for 
their financial support. 

Rajeev Gore, Alexander Leitsch and Tobias Nipkow 

April 2001 
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Program Termination Analysis by Size-Change 
Graphs (Abstract) 



Neil D. Jones 

DIKU, University of Copenhagen, e-mail: neil@diku.dk 

Size-change analysis is based on size-change graphs giving local approximations 
to parameter size changes derivable from program syntax. The “size-change ter- 
mination” principle for a first-order functional language with well-founded data 
is: a program terminates on all inputs if every infinite call sequence (following 
program control flow) would cause an infinite descent in some data values. Two 
termination detection algorithms are given in [9]: one involving Biichi automata 
that directly realizes the definition; and a more useful one involving a closure 
algorithm on a set of size-change graphs. 

Termination analysis based on this principle seems simpler, more general and 
more automatic than other work in the literature: lexicographic orders, mutually 
recursive function calls and permuted arguments are all handled automatically 
and without special treatment, with no need for human-supplied argument orders, 
or theorem-proving search methods not certain to terminate at analysis time. 

Finally, the problem’s intrinsic complexity is surprisingly high: complete for 
PSPACE. An interesting consequence: many other analyses found in the termina- 
tion and quasi-termination literature are also PSPACE hard. 

Some examples of terminating programs. 

1. Program with permuted parameters: 

p(m,ii,r) = if r>0 then l:p(m, r-1, n) else 

if n>0 then 2:p(r, n-1 ,m) 
else m 

2. Program with permuted and possibly discarded parameters: 

f(x,y) = if y= [] then x else 

if x=[] then l:f(y, tl y) 
else 2:f(y, tl x) 

3. Function with lexically ordered parameters: 

a(m,n) = if m=0 then n+1 else 

if n=0 then l:a(m-l, 1) 

else 2:a(m-l, 3:a(m,n-l)) 

Claim: These programs all terminate for a common reason: any infinite call 
sequence (regardless of test outcomes) causes infinite descent in one or more 
values. Examples 1, 2 seem to possess no natural lexical descent. In fact, the 
reasoning is necessarily tricky, since the problem is PSPACE-hard. Two algorithms 
are given in [9] to perform the test automatically. 

Theorem 1. Size-change termination is decidable in polynomial space. 
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Theorem 2. Size-change termination is PSPACE-Ziorrf. 

A known PSPACE-complete problem: Given a Boolean program b, To decide 
whether b terminates. Proof idea: Given Boolean program b, construct a pro- 
gram p of size polynomial in the size of b such that b terminates if and only if p 
is not size-change terminating. 

Corollary 1. The termination and quasi-termination criteria of [2,5,7,10,13] 
all are PSPACE-hard. 

Proof. Point: These analyses all give correct results when applied to programs 
whose data flow is similar to that of p above. The proof is essentially the same, 
with the construction modified as necessary to make the program fail the condi- 
tion tested by the respective method, just when the Boolean program terminates. 

Related Work. The pspace lower bound is the first such result of which we 
are aware. The termination algorithm, though, has counterparts in other areas. 

Typed functional programs. Abel and Altenkirch [1] developed a system called 
foetus that accepts as input mutual recursive function definitions over strict 
positive datatypes. It returns a lexical ordering on the arguments of the pro- 
gram’s functions, if one exists. The method of [9] handles programs with or 
without such a lexical ordering. 

Term rewriting systems. TRS termination analyses often perform expensive 
searches for a suitable ordering to solve a set of inequalities; e.g., in [15], a 
heuristic is given for automatically generating a general class of transformation 
orderings, which includes the lexical order. In the present work, it has not been 
the aim to look for orderings. Size-change termination naturally subsumes an 
interesting class of orderings, including the lexical ordering, and the ordering for 
the example with permuted and discarded parameters, which is not obvious. 

One TRS application is to model semantics of functional programs. A func- 
tional program is easily translated into a TRS whose termination implies that of 
the subject program. Unfortunately, the result is often non-simply-terminating, 
which means the usual approach (find an order so the LHS of each rewrite rule 
is strictly greater than the RHS), does not work. To treat such TRS, Arts and 
Giesl [3,4,6] applied programming intuition to develop stronger methods. Unfor- 
tunately, this required extending existing techniques for TRS termination; and 
expensive searches for suitable orderings. For a term-rewriting perspective, these 
methods are able to a handle a larger class of TRS than dealt with before. For 
analyzing functional programs, our dataflow approach seems less circuitous. 

Finally, for TRS corresponding to programs, the polynomial interpretation 
method for discovering orderings [6] can sometimes provide an alternative to size 
analysis by appropriately interpreting function symbols in the subject program 
(when it succeeds). The approach in [9] is instead to factor out size analysis 
as an orthogonal concern, and focus on the size-change termination principle 
and its application. This appears to give a natural separation of concerns when 
analyzing termination of programs. 
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Logic programs. There has been extensive research on automatic termination 
analysis for logic programs. As explained in [13], it is not always obvious that a 
predicate will terminate when executed with unusual instantiation patterns, or 
that a predicate always terminates on backtracking. For interpreters that have 
a choice of evaluation orders, termination analysis is especially important. 

Some analyses that have been described for logic programs (e.g., in [11,14]) 
use a simple criterion: for every recursive invocation of a predicate, determine 
that the sum over a subset of input fields (fixed for each predicate) is strictly 
decreased. This does not allow handling of lexical descent. The strength of these 
methods derives from aggressive size analysis, which enables, in particular, sort- 
ing routines (quicksort and insertion sort) to be handled automatically. It is also 
possible to incorporate size analysis into the present approach, but our aim has 
been to investigate the size-change termination principle by itself. 

Some logic program termination analyzers use a termination criterion com- 
patible with size-change termination [10,5]. The analysis in [13] has been ex- 
tended to a termination analyzer for Prolog programs called Termilog [10]. It 
turns out that Termilog can solve size-change termination problems precisely via 
a suitable encoding. In fact, our graph-based algorithm, although devised inde- 
pendently, is in essence a functional programming counterpart of the Termilog 
algorithm. Thus the pspace hardness result applies to Termilog’s Analysis. 

All the works on Prolog termination that we are aware of devote much at- 
tention to orthogonal issues such as uninstantiated variables and size analysis. 
While no doubt important in practice, an impression is created that the com- 
plexity of Prolog termination stems from these concerns; but our complexity 
result sayss that the core size-change termination principle is intrinsically hard. 

Quasi-termination. The pspace hardness construction can be modified to show 
that the in-situ descent criterion for quasi-termination in [2] is pspace hard. 
Glenstrup [7] shows one way that quasitermination analysis techniques can be 
used for termination. 
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1 Introduction 

Security protocols aim to protect the honest users of a network from the dishon- 
est ones. Asymmetric (public key) cryptography is valuable, though it is normally 
used in conjunction with symmetric cryptography, where two users share a secret 
key. Asymmetric cryptography is typically used to securely exchange symmetric 
keys, which carry the bulk of the traffic. This mode of operation is faster than 
using expensive public-key encryption exclusively. It is also more secure, since 
the symmetric keys can be changed frequently. However, the protocol used to set 
up of this communication must be designed with care. For example, each mes- 
sage typically includes a nonce: a freshly-generated number that the other party 
must include in his response; the first party then knows that the response was 
not an old message replayed by an intruder. Many flaws have been discovered in 
security protocols [5]. 

Security protocol verification technologies have progressed in recent years. A 
variety of tools are available for analyzing protocols. Model checking is excellent 
for debugging a protocol, finding attacks in seconds [6,7]. Theorem proving is 
valuable too: it can analyze protocols in more detail and handles the protocols 
that are too big for model checking. Subgoals presented to the user suggest 
possible failure modes and give insights into how the protocol operates. 

Past work on protocol verification has focused on protocols arising from the 
academic community. Only seldom have deployed protocols been investigated, 
such as Kerberos [3], SSL [8] and SSL’s successor, TLS [12]. Past work has largely 
focused on key exchange protocols. Such protocols allow two participants (in- 
variably called Alice and Bob) to agree on a session key: a short-term symmetric 
key. In this paper, I would like to describe a project, joint with Bella, Massacci 
and Tramontane, to verify a very large commercial protocol: SET, or Secure 
Electronic Transactions [15]. 

2 The SET Protocol 

People normally pay for goods purchased over the Internet using a credit card. 
They give their card number to the merchant, who claims the cost of the goods 
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against it. To prevent eavesdroppers from stealing the card number, the transac- 
tion is encrypted using the SSL protocol. This arrangement requires the customer 
and merchant to trust each other: an undesirable requirement even in face-to-face 
transactions, and across the Internet it admits unacceptable risks. 

— The cardholder is protected from eavesdroppers but not from the merchant 
himself. Some merchants are dishonest: pornographers have charged more 
than the advertised price, expecting their customers to be too embarrassed 
to complain. Some merchants are incompetent: a million credit card numbers 
have recently been stolen from Internet sites whose managers had not applied 
patches (available free from Microsoft) to fix security holes [9]. 

— The merchant has no protection against dishonest customers who supply an 
invalid credit card number or who claim a refund from their bank without 
cause. Contrary to popular belief, it is not the cardholder but the merchant 
who has the most to lose from fraud. Legislation in most countries protects 
the consumer. 

The SET protocol aims to reduce fraud by introducing a preliminary regis- 
tration phase. Both cardholders and merchants must register with a certificate 
authority (CA) before they can engage in transactions. The cardholder thereby 
obtains electronic credentials to prove that he is trustworthy. The merchant 
similarly registers and obtains credentials. These credentials do not contain sen- 
sitive details such as credit card numbers. Later, when the customer wants to 
make purchases, he and the merchant exchange their credentials. If both parties 
are satisfied then they can proceed with the transaction. Credentials must be 
renewed every few years, and presumably are not issued to known fraudsters. 

SET comprises 15 subprotocols, or transactions, in all. Some observers, noting 
its extreme complexity, predict that it will never be deployed. However, the 
recent large rise in credit card fraud [1] suggests that current arrangements are 
unsustainable. SET or a derivative protocol may well be deployed in the next 
several years. To a researcher, SET has a further attraction: it makes heavy use 
of primitives such as digital envelopes that protocol verifiers have not examined 
before now. 



3 Cardholder Registration 

As described above, each cardholder must register before he is allowed to make 
purchases. He proves his identity by supplying personal information previously 
shared with his issuing bank. He chooses a private key, which he will use later 
to sign orders for goods, and registers the corresponding public key, which mer- 
chants can use to verify his signature. In keeping with normal practice, SET 
requires each participant to have separate key pairs for signature and encryp- 
tion. 

Cardholder registration comprises six messages: 

1. The cardholder contacts the CA to request registration. 




SET Cardholder Registration: The Secrecy Proofs 



7 



2. The CA replies, returning its public key certificates. These contain the CA’s 
public keys (which the cardholder needs for the next phase) and are signed 
by the Root Certificate Authority (so that the cardholder knows they are 
genuine). 

3. The cardholder requests a registration form. In this message, he submits his 
credit card number to the CA. 

4. The CA uses the credit card number to determine the cardholder’s issuing 
bank and returns an appropriate registration form. 

5. The cardholder chooses an asymmetric public/private key pair. He submits 
the public key along with the completed registration form to the CA, who 
forwards it to the bank. 

6. The bank checks the various details, and if satisfied, authorises the CA to 
issue credentials. The CA signs a certificate that includes the cardholder’s 
public signature key and the cryptographic hash of a number — the PAN- 
Secret — known only to the CA and cardholder. Finally the cardholder 
receives the credentials and is ready to go shopping. 

Does verifying cardholder registration serve any purpose? The payment phase 
performs the actual E-commerce, and protocol verifiers often assume that partic- 
ipants already possess all needed credentials. However, cardholder registration is 
a challenging protocol, particularly when it comes to proving that the PANSecret 
is actually secret. 

The most interesting feature of cardholder registration, from the viewpoint 
of verification, is its use of digital envelopes. To send a long message to the CA, 
the cardholder generates a fresh symmetric key and encrypts the message, using 
public key encryption only to deliver the session key to the CA. As mentioned at 
the start of this paper, this combination of symmetric and asymmetric encryption 
is more efficient and secure than using asymmetric encryption alone. However, 
the two-stage process makes a protocol harder to analyze. The most complicated 
case is with the last message exchange, where the cardholder sends the CA two 
session keys. One of these keys encrypts the cardholder’s message and the other 
encrypts the CA’s reply. 

We could simplify the protocol by eliminating digital envelopes and remov- 
ing unnecessary encryption. However, the resulting protocol would be trivial. 
Experience shows that simplifying out implementation details can hide major 
errors [14]. Cardholder registration is valuable preparation for the eventual ver- 
ification of the purchase phase. 

4 The Secrecy Proofs 

We use the inductive method of protocol verification, which has been described 
elsewhere [11,13]. This operational semantics assumes a population of honest 
agents obeying the protocol and a dishonest agent (the Spy) who can steal mes- 
sages intended for other agents, decrypt them using any keys at his disposal and 
send new messages as he pleases. Some of the honest agents are compromised. 
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meaning the Spy has full access to their secrets. A protocol is modelled by the 
set of all possible traces of events that it can generate. Events are of three forms: 

— Says A B X means A sends message X to B. 

— Gets A A means A receives message X. 

— Notes A A means A stores A in its internal state. 

The model of Cardholder Registration is largely the work of Bella, Massacci 
and Tramontane, who devoted many hours to decrypting 1000 pages of SET 
documentation [2]. We have flattened the hierarchy of certificate authorities. 
The Root Certificate Authority is responsible for certifying all the other CAs. 
Our model includes compromised CAs — as naturally it should — though we 
assume that the root is uncompromised. The compromised CAs complicate the 
proofs considerably, since large numbers of session keys and other secrets fall 
into the hands of the Spy. Here is a brief summary of the notation: 

— set_cr is the set of traces allowed by Cardholder Registration 

— used is the set of items appearing in the trace, to express freshness 

— symkeys is the set of symmetric keys^ 

— Nonce, Key, Agent, Crypt and Hash are message constructors 

— {|Ai, . . . , X„l is an n-component message 

Here is part of the specification, the inductive rule for message 5. Variable 
evs5 refers to the current event trace: 

|evs5 G set_cr; C = Cardholder k; 

Nonce NC3 ^ used evs5; Nonce CardSecret ^ used evs5; NCSACardSecret ; 
Key KC2 ^ used evs5; KC2 G symKeys; 

Key KC3 ^ used evs5; KC3 G symKeys; KC2AKC3; 
cardSK (f: symKeys; 

Gets C . . . G set evsS; 

Says C (CA i) ... G set evs5] 

Says C (CA i) 

I Crypt KC3 \Agent C, Nonce NC3, Key KC2, Key cardSK, 

Crypt (invKey cardSK) 

(Hasht\Agent C, Nonce NC3, Key KC2, 

Key cardSK, Pan (pan C) , Nonce 

CardSecret 

Crypt EKi t\Key KC3, Pan (pan C), Nonce CardSecret J-J- 
# evs5 G set_cr 

Much has been elided from this rule, but we can see several things: 

— the generation of two fresh nonces, NC3 and CardSecret 

— the generation of two fresh symmetric keys, KC2 and KC3, to be used as session 
keys 

— a message encrypted using EKi (the CA’s public key) and containing the 
credit card number (pan c) and the key KC3 

^ In an implementation, a symmetric key occupies 8 bytes while an asymmetric one 
occupies typically 128 bytes, so the two types are easily distinguishable. 
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— a message encrypted using KC3 and containing the symmetric key KC2 and 
the cardholder’s public signature key, cardSK 

The two encrypted messages constitute a digital envelope. 

The PANSecret mentioned in §3 above is computed as the exclusive-OR of 
other secret numbers generated by the cardholder and the CA. Do these num- 
bers really remain secret? Since they are encrypted using symmetric keys, the 
proof requires a lemma that symmetric keys remain secret. Two complications 
are that some symmetric keys do not remain secret, namely those involving a 
compromised CA, and that some symmetric keys are used to encrypt others. 
The latter point means that the loss of one key can compromise a second key, 
leading possibly to unlimited losses. 

The problem of one secret depending on another has occurred previously, 
with the Yahalom [10] and Kerberos [3] protocols. Both of these are simple: 
the dependency relation links only two items. Cardholder registration has many 
dependency relationships. It also has a dependency chain of length three: in the 
last message, a secret number is encrypted using a key (ffC2) that was itself 
encrypted using another key (kC3). 

Fortunately, the method described in earlier work generalizes naturally to this 
case and to chains of any length. While the definitions become more complicated 
than before, they follow a uniform pattern. The idea is to define a relation, for 
a given trace, between pairs of secret items: (AT, X) are related if the loss of the 
key K leads to the loss of the key or nonce X. Two new observations can be 
made about the dependency relation: 

— It should ignore messages sent by the Spy, since nothing belonging to him 
counts as secret. This greatly simplifies some proofs. 

— It must be transitive, since a dependency chain leading to a compromise 
could have any length. Past protocols were too simple to reveal this point. 

Secrecy of session keys is proved as it was for Kerberos IV [3], by defining 
the relation KeyCryptKey DK K evs. This relation captures instances of message 5 
in which somebody other than the Spy uses KC3 to encrypt KC2 in the event 
trace evs. The session key compromise theorem states that a given key can be 
lost only by the keys related to it by KeyCryptKey. The form of this lemma has 
been discussed elsewhere [10]; it handles cases of the induction in which some 
session keys are compromised. Using this lemma, we can prove that no symmetric 
keys are lost in a communication between honest participants: 

|Ci i ^ bad; K G symKeys ; evs G set_cr; 

Says (Cardholder k) (CA i) X G set evs; Key K G parts fX!-] 

Key K (f: analz (knows Spy evs) 

Any symmetric key that is part of a message X sent by a cardholder (that is. Key 
K G parts {X}) is not derivable from material visible to the Spy (that is. Key K 
(f: analz (knows Spy evs ) ) . 

Given that the session keys are secure, we might hope to find a simple proof 
that nonces encrypted using those keys remain secret. However, secrecy proofs 
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for nonces require the same treatment as secrecy proofs for keys. We must define 
the dependency relation between keys and nonces and prove a lemma analogous 
to the one shown above. 

Secrecy of nonces is proved as it was for Yahalom [10], except that there 
are many key-nonce relationships rather than one. Note also the occurrences of 
KeyCryptKey, which allow for longer dependency chains. 

KeyCryptNonce DK N (ev # evs) = 

(KeyCryptNoB.ee DK N evs V 
(case ev of 
Says A B Z ^ 

A / Spy A 

((3X Y. Z = ^Crypt DK ^Agent A, Nonce N, X|}, yp V 
(3K i X Y. 

Z = Crypt K Hsign (priSK i) (\Agent B, Nonce N, X|}, yj' ^ 

(DK=K V KeyCryptKey DK K evs)) V 
(3K i NC3 Y. 

Z = Crypt K 

{jsig'ii CpriSif i) \ Agent B, Nonce NC3, Agent(CA i) , Nonce HlJ-, 

yj A 

(DK=K V KeyCryptKey DK K evs)) V 
(3i. DK = priEK i)) 

I Gets A’ X ^ False 
I Notes A’ X ^ False)) 

Finally, we can show that the secrets exchanged by the parties in the final 
handshake remain secure. 

[Ci i ^ bad; 

Says (Cardholder k) (CA i) 

{|X, Crypt EKi \Key KC3, Pan p. Nonce CardSecret G set evs; 

. . . ; evs G set_cr]| 

Nonce CardSecret ^ analz (knows Spy evs) 

This theorem concerns the cardholder’s secret. There is an analogous one for the 
CA’s secret. 

G. Bella has proved that the credit card number also remains secret. It looks 
straightforward: the number is encrypted using the CA’s public key, which is 
secure provided the CA is uncompromised. As usual, however, the proof is harder 
than it looks. It requires a lemma stating that no symmetric keys are of any use 
to the spy for stealing a credit card number. This lemma looks obvious too, but 
both it and the main theorem are non-trivial inductions. Their proofs together 
require about one CPU minute. 

Why are proofs so difficult and slow? The digital envelopes and digital sig- 
nature conventions are to blame. Compared with other protocols analyzed using 
the inductive method, cardholder registration has nested encryption, resulting in 
huge case splits. The verifier is sometimes presented with a giant subgoal span- 
ning many pages of text. One should not attempt to prove such a monstrosity 
but instead to improve the simplification so that it does not occur again. 
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5 Observations about Cardholder Registration 

The proofs suggest that cardholder registration is secure. However, some anoma- 
lous features come to light. These do not derive from the formal analysis but 
merely by a close inspection of the protocol. 

There is unnecessary encryption. The cardholder’s signature verification key 
is encrypted, when it is a public key! The cardholder certificate is also encrypted, 
when it is of no use to anyone but the cardholder. Public- key certificates are 
nearly always sent in clear; this encryption is presumably intended to strengthen 
confidence in SET and to reassure cardholders. Nonces whose purpose is to 
ensure freshness do not have to be encrypted, but in SET they usually are. This 
forces KeyCryptNonce to take them into account, increasing the expression blow- 
up in secrecy proofs. We have a paradox: protocol designers who are concerned 
about security will include additional encryption, but that encryption actually 
makes the protocol more difficult to verify. 

I observed two insecurities. The cardholder is not required to generate a fresh 
signature key pair, but may register an old one. There is a risk that this old one 
could be compromised. SET accordingly includes a further security measure: a 
secret number known to the cardholder, which he later uses as a password. This 
PANSecret is the exclusive-OR of numbers chosen by the two parties (see §4), 
and the cardholder chooses his number before the CA does. Since exclusive-OR 
is invertible, a criminal working for a CA can give every cardholder the same 
PANSecret. 

This combination of insecurities introduces some risk that a criminal could 
impersonate the cardholder. The cardholder’s implementation of SET can repair 
the first defect by always generating a fresh signature key pair. The second defect 
is, in principle, easy to fix: simply change the computation of the PANSecret, 
replacing the exclusive-OR by cryptographic hashing. But that unfortunately is 
a change to the protocol itself. 



6 Conclusions 

Our joint work has been fruitful. We had been able to specify and verify card- 
holder registration. Our model is abstract but retains much detail. We can prove 
secrecy in the presence of digital envelopes. We have strengthened our previous 
work on the relationships between secrets. There must be a connection with Co- 
hen’s secrecy invariant [4], though I am not sure of the details. We look forward 
to analyzing the remainder of SET. 
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Abstract. Algorithms and datastructures form the kernel of any effi- 
cient theorem prover. In this abstract we discuss research on algorithms 
and datastructures for efficient theorem proving based on our experi- 
ence with the theorem prover Vampire. We also briefly overview other 
works related to algorithms and datastructures, and to efficient theorem 
proving in general. 



1 Introduction 

To implement an efficient automatic theorem prover, one has to put together at 
least three ingredients: good theory, efficient algorithms and datastructures, and 
clever heuristics.^ In the recent years, a considerable progress has been made in 
the theory of resolution-based systems. This theory is build upon completeness 
theorems for resolution calculi with notions of redundant inferences and redun- 
dant derivations [6,63]. The theory is well-understood, but a good theory alone 
is not enough to implement an efficient prover. 

The progress in theory can be characterized by the following observation: a 
prover based on the theory known in 1970 would now be hopelessly inefficient 
(unrestricted resolution and paramodulation, use of function reflexivity axioms) . 
A prover based on the theory known in 1980 would outperform a 1970 prover 
by several orders of magnitude on difficult problems (mainly because of the use 
of simplification orderings). Compared to a 1980 prover, a prover based on the 
theory of 1990 would be several orders of magnitude faster on many difficult 
problems and moreover more flexible due to some new theoretical results (the 
general theory of redundancy, selection functions). 

Since the first paper on resolution theorem proving [77], many theorem 
provers were developed by various researchers and groups. The most consis- 
tent implementation efforts were undertaken at Argonne National Laboratory; 
they resulted in the development of a series of systems, including Logic Machine 
Architecture [44,43] and Otter [51]. The nature of research at Argonne was for- 
mulated in [42]: controlling redundancy in large search spaces. Recently, several 
new efficient first-order theorem provers emerged, including the resolution-based 
provers Spass [96,95], E [78,79], Gandalf [89], Vampire [70,75], Bliksem [16], and 

^ See [56] for a similar observation. 
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SCOTT [82] ; the model-elimination based prover Setheo [40] , and the equational 
provers Waldmeister [34] and Fiesta (see [59]). 

These provers are extremely efficient. For example the Steamroller problem 
discussed some time ago in the literature [87] is now a trivial problem for all 
of these systems. Several open problems solved by Otter in 1993 [50] can now 
be routinely solved by some of these provers. However, there are many first- 
order problems coming from various applications, which are still beyond the 
capabilities of the state-of-the-art provers. Since first-order logic is undecidable, 
it is unreasonable to expect new systems to efficiently solve all of them in the 
near future. However, if we can increase performance of the modern provers by 
several orders of magnitude for a large number of such problems, many of these 
problems will be routinely solved, thus saving time for application developers. 

In our opinion, such a drastic increase in efficiency in the near future will 
be mainly based on the development of new algorithms and datastructures, and 
understanding how the theory developed so far can be efficiently implemented on 
top of the existing architectures of theorem provers. 

2 How Do Efficient Datastructures Influence Performance 

Before implementing Vampire, I implemented several more or less functioning 
theorem provers in REFAL [91], LISP, and Prolog. Efficiency was not an aim. I 
was interested in comparing their behavior when different methods were imple- 
mented (for example, [92] implemented the inverse method). 

In 1993 Dominique Bolignano invited me to visit his research group at Bull, 
near Paris, for about two months. As part of my visit to Bull I gave a talk on 
theorem proving by the inverse method and seemed to convince the audience 
that the inverse method is worth trying. However, my old implementation of the 
inverse method was not maintained, while the new implementation did not exist. 
Then I decided to implement an efficient prover. The development of the new 
prover Vampire has changed considerably my perception of automated theorem 
proving. The first surprise came when I compared the behaviour of my newly 
implemented prover with that of Otter. When both provers used hyperresolution 
for solving the same problems, in the first few seconds of proof-search their 
performance was comparable. But after a few seconds Otter kept up making 
inferences at the same pace, while Vampire seemed to enter a deadlock. A simple 
profiling has shown that nearly all of the running time was spent on forward 
subsumption. So my first exercise in efficient theorem proving was to implement 
efficiently forward subsumption.^ Search for literature on efficient subsumption 
did not help very much: although I could find a paper on efficient subsumption 
[29], the algorithm of this paper was not very useful when every newly generated 
clause had to be checked for subsumption with over 10® clauses in the current 
search space. 

Search for an efficient subsumption algorithm resulted in discovery of the code 
tree indexing technique described in [93] and improved in [72] . After implemen- 

^ The value of subsumption was also observed by the Argonne group [97]. 
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tation of code trees for forward subsumption, the performance of Vampire when 
running hyperresolution was comparable to that of Otter. From 1998 Vampire 
is implemented and maintained by Alexandre Riazanov. The rest of this paper 
describes some aspects of what Vampire has taught me about efficient theorem 
proving. 



3 Saturation-Based Resolution Theorem Proving 

All of the modern resolution-based first-order theorem provers implement some 
variant of the given clause algorithm [51,42]. Several variants of this algo- 
rithm are overviewed in [71,95]. One of the versions, roughly corresponding to 
those used in Otter and Fiesta, is shown in Figure 1 (taken from [71]). It is 
parametrized by several procedures explained below: 

~ select is the clause selection function. It decides which clause should be 
selected for activation. 

— infer is the function that performs inferences between the current clause 
current and the set of active clauses active. This function returns the set of 
clauses obtained by all such possible inferences. This function varies from 
system to system. Usually, infer applies inferences in some complete infer- 
ence system of resolution with paramodulation. 

— simplify{set, by) is a procedure that performs simplification. It deletes re- 
dundant clauses from set and simplifies some clauses in set using the clauses 
in by. To preserve completeness, the simplified clauses are always moved to 
passive . Typically, deleted clauses include tautologies and those clauses sub- 
sumed by clauses in by. A typical example of simplification is rewriting by 
unit equalities in by. 

— Likewise, inner simplify simplifies clauses in new using other clauses in new. 

When we simplify new using the clauses in active U passive, we speak of 
forward simplification; when we simply active and passive using the clauses in 
new, we speak of backward simplification. The name given clause algorithm is 
due to the fact that the clause current is called the given clause in Otter’s 
terminology. 

It is instructive to explain several problems facing implementors of the given 
clause algorithm, to show the gap between the theory and practice of resolution- 
based theorem proving. In theory, it is not hard to prove that this algorithm is 
complete, provided that the underlying logical calculus is complete. In practice, 
at least two problems arise. Both problems are due to the fast growth of the 
search space. The first problem is how to select the right current clause in a 
huge set of passive clauses. This problem can be solved only by a large number 
of experiments over a large collection of problems. The most common approach 
is to maintain several priority queues on passive clauses, and pick up clauses from 
this priority queues. A very popular technique is to maintain two queues in which 
the clauses are prioritized by their weight and age, respectively. The clauses are 
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input: init : set of clauses ; 
var active, passive, new : sets of clauses ; 
var current : clause ; 
active : = 0 ; 
passive := init ; 
while passive ^ 0 do 
current : = select(passive) ; 
passive : = passive — { current} ; 
active := active VJ {current} ; 
new := infer {current, aetive) ; 
if goal_found{new) then return provable ; 
inner .simplify {new) ; 
simplify {new, active U passive) ; 
if goal_found{new) then return provable ; 
simplify {aetive, new) ; 
simplify {passive, new) ; 

if goal_found{active U passive) then return provable ; 
passive : = passive U new 

od ; 

return unprovable 



Fig. 1. A Given Clause Algorithm 



picked from the queues using the so-called age-weight ratio (also called the pick- 
given ratio); for example if the ratio is 1:5, than out of each 6 picked clauses, 
1 will be selected as the oldest clause, and 5 as the lightest clauses. The prover 
E maintains more than two queues. Even this simple clause selection scheme 
creates many problems, when clauses must be deleted from the search space (for 
example, when the available memory is exhausted), since every clause may be 
kept in several priority queues. The best clause selection strategies are not yet 
well-understood . 

Another serious problem is caused by the proliferation of passive clauses. A 
large number of passive clauses results in deterioration of the proof-search speed 
and huge memory consumption. To illustrate this, we provide statistics on an un- 
successful run of Vampire with the time limit of 1 minute on the TPTP problem 
ANA003-1. During this run, 261,573 clauses were generated. The overall num- 
ber of active clauses was 1,967, the overall number of passive clauses 236,389. To 
cope with the problems of passive clauses, several solutions have been proposed. 
For example, to impose a weight limit on clauses and increase the weight limit, 
if the prover unsuccessfully terminates with the current weight limit. 

The most radical solution to the passive clauses problem was originally im- 
plemented in DISCOUNT [2] and is now used in Waldmeister and E (and can 
be used as an option in Spass and Vampire) . These provers use a different main 
loop, in which passive clauses do not participate in backward simplifications un- 
til they are selected as the current clause. In fact, these provers do not store the 
passive clauses at all, but only store information about the inference rule used 
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to obtain a passive clause. This implementation scheme requires double forward 
simplification, but is believed to be space- and time-efficient. The provers im- 
plementing the DISCOUNT loop are essentially based on the principle: process 
active clauses efficiently. The price to pay is that some very useful simplify- 
ing inferences involving a passive clause can be considerably delayed until the 
clause has been selected as current. The main principle of Vampire with respect 
to the main loop is: do simplifications eagerly and non-simplifying inferences 
lazily. Therefore, Vampire’s default option is the Otter main loop. However, to 
work with the large among of passive clauses efficiently, the so-called limited re- 
source strategy was invented [71]. This strategy is applicable when the time limit 
on solving a problem is specified in advanced. Vampire tries to estimate which 
clauses cannot be processed by the end of the time limit at all, and discards 
such clauses as useless. Experiments reported in [71] show high efficiency of the 
limited resource strategy as compared to other approaches. 

4 Term Indexing 

To be able to process hundreds of thousands clauses in less than a minute, all 
most important operations should ideally be implemented on the set-at-a-time 
basis. For example, subsumption is NP-complete, and checking every newly gen- 
erated clause for subsumption against several hundred thousand of kept clauses 
sequentially is hopelessly slow. There is a growing number of papers on term 
indexing (see [32,81] for an overview): an approach that allows one to imple- 
ment efficiently expensive massive operations on terms and clauses, for example, 
subsumption of one clause by a large database of clauses. 

The problem of term indexing can be formulated abstractly as follows (see 
[81]). Given a set L of indexed terms (or clauses), a binary relation R over terms 
(called the retrieval condition) and a term t (called the query term), identify 
the subset M of L that consists of the terms I such that R{l,t) holds. Terms in 
M will be called the candidate terms. Typical retrieval conditions used in first- 
order theorem proving are matching, generalization, unifiability, subsumption, 
syntactic equality, variance etc. Such a retrieval of candidate terms in theorem 
proving is interleaved with insertion of terms to L, and deletion of them from L. 

In order to support rapid retrieval of candidate terms, we need to process 
the indexed set into a data structure called the index. Indexing data structures 
are well-known to be crucial for the efficiency of the current state-of-the-art 
theorem provers. Term indexing is also used in logic and functional program- 
ming languages implementation, but indexing in theorem provers has several 
distinctive features: 

1. Indexes in theorem provers frequently store 10^-10® complex terms, unlike 
a typically small number of shallow terms in functional and logic programs. 

2. In logic or functional language implementation the index is usually con- 
structed during compilation. On the contrary, indexes in theorem proving 
are highly dynamic, since terms are frequently inserted in and deleted from 
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indexes. Index maintenance operations start with an index for an initial set 
of terms L, and incrementally construct an index for another set L' that is 
obtained by insertion or deletion of terms to or from L. 

3. In many applications it is desirable for several retrieval operations to work 
on the same index structure in order to share maintenance overhead and 
memory consumption. 

Therefore, along the last two decades a significant number of results on new 
indexing techniques for theorem proving have been published and successfully 
applied in different provers [81,32,33,69,84,64,49,13,30,31,93,76,72,26,57]. 

For every retrieval condition used in theorem provers, it is not necessary to 
retrieve exactly all candidates satisfying this retrieval condition. If the retrieval 
condition is used for inferences (as, e.g., unification is used for inferences by 
resolution and paramodulation), then it is enough to retrieve a superset of can- 
didates, and then check the retrieval condition for every member of this superset. 
If the retrieval condition is used for simplifications (as, e.g., retrieval of gener- 
alizations is used for simplification by unit equalities or forward subsumption 
by unit clauses), then it is enough to retrieve a subset of candidates only, since 
simplifications only reduce the search space, but do not influence completeness. 
If a particular term indexing technique retrieves exactly all candidates, then 
this technique is said to perform perfect filtering. It is always an issue of debate 
in automated deduction, whether perfect filtering should be implemented for a 
particular retrieval condition. For example, [90] advocates the use of imperfect 
filtering for subsumption. 

Term indexing is one of the main research directions for Vampire. We believe 
that perfect filtering is desirable for all operations for which indexing is required 
at all. As a consequence. Vampire stores a large number of indexes for various 
operations: partially adaptive code trees [93,72] for forward subsumption, sub- 
sumption resolution, and variance check; code trees with precompiled ordering 
constraint for forward rewriting (also called demodulation) by unit equalities; 
path indexing with compiled database joins implemented using skip lists for 
backward subsumption and backward demodulation by unit equalities [73] ; tries 
for unification used to implement resolution and paramodulation; and another 
kind of tries for storing perfectly shared terms. We learned that it pays off to 
spend time for implementing new indexing techniques: for nearly every retrieval 
condition there are problems for which this retrieval condition contributes to the 
running time considerably. We believe that term indexing will be in the heart of 
theorem proving research in the future. 

It is due to term indexing that the modern provers can quickly solve prob- 
lems which require search in a space of several million complex clauses. But it is 
also due to term indexing that implementation of new features on top of existing 
architectures requires non-trivial efforts and invention of completely new tech- 
niques. Every new feature brings in a tradeoff between the time/space gained by 
the use of the feature on one hand, and the time spent on checking applicability 
of this feature and space needed to implement the feature efficiently. In addition, 
some promising features require a non-trivial implementation. 




Algorithms, Datastructures, and Other Issues 



19 



As a result, some enhancements of theorem provers well-known in theory have 
never been implemented. A typical example is the basic strategy [17,7,60]. Al- 
though in theory basic superposition saves from performing some redundant in- 
ferences, in practice implementation of basic superposition requires considerable 
changes in all algorithms and datastructures. Therefore, all indexing techniques 
and all algorithms for term retrieval must be adapted to the basic strategy. As 
a consequence, the basic strategy has never been fully implemented. 

5 Building-In Equational Theories 

We are in search of techniques that can speed up the provers by several orders of 
magnitude. Built-in equational theories is one of such techniques. Many problems 
coming from applications are theorems about structures axiomatized by a set 
of equations. The most common equational theory is AC: the theory containing 
associativity and commutativity axioms for a function symbol. 

Although the idea of built-in equational theories have been around at least 
from 1972 [68], the first implementation of AC was undertaken more than 20 
years after in the EQP equational theorem prover [48]. This implementation re- 
sulted in probably the most celebrated event in the automated deduction com- 
munity: the automatic solution of the Robbins problem [47] by EQP. 

Although EQP ran for 8 days to obtain the proof, the proof-search for the 
Robbins problem was remarkably small. The total number of equations processed 
during the proof was less than 50,000. Modern provers often process such a 
number of clauses in a matter of a few seconds. This shows that the Robbins 
problem may in fact be not very difficult for a prover in which AC is implemented 
efficiently, including term indexing modulo AC. 

So far research on built-in equational theories was mainly built around equa- 
tional unification (see e.g., [4] for a recent overview). The special case of AC- 
unification was discussed in a number of papers, dating back to [85,86], but 
only recently efficient algorithms have been described [1]. The experience with 
the non-AC theorem proving shows that efficient matching is more important 
than efficient unification,^ but there is essentially no literature on efficient AC- 
matching. Moreover, there are essentially no publications on term indexing in 
presence of built-in theories. The only paper we know about discusses a special 
case of indexing for AC-matching of linear terms [5] . Vampire has commutativity 
built-in in some term indexes and retrieval algorithms (see e.g., [73]). 

We conclude that efficient algorithms and datastructures for built-in equa- 
tional theories should become a major topic for research in automated deduction. 

6 Cheap Substitutes 

When the price to pay for implementing a particular algorithm is too high, 
cheap substitutes can be used. This is especially true for the cases when com- 

® Indeed, only 0.1% of the EQP running time on the Robbins problem was spent for 
AC-unification. 
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pleteness is not an issue, for example when implementing simplification rules. 
There are several examples of successful implementation of “cheap substitutes” 
for expensive operations. For example, Waldmeister implements specialized al- 
gorithms for AC-completion [3] . In resolution theorem proving it is desirable to 
quickly check whether one can apply a resolvent of two clauses which subsumes 
one of these clauses. Such an application simplifies the search space, since the 
resolvent will then replace one of its parents. However, no efficient algorithm 
is known for finding simplified resolvents. There is an incomplete but relatively 
cheap operation called subsumption resolution (see e.g. [6]) which can be imple- 
mented essentially without extra overhead, provided that efficient subsumption 
is implemented. Vampire implements subsumption resolution using the indexes 
for subsumption, but slightly modified algorithms. Another example of a cheap 
substitute is the implementation of splitting in Vampire [74] in the form of split- 
ting without backtracking. Though it does not have full power of the splitting 
rule as implemented, e.g., in Spass [95], splitting without backtracking can be 
implemented without radical changes in the architecture of a theorem prover. 



7 Constraints 

The use of symbolic constraints in automated deduction gives stronger notions 
of redundancy than those formulated without constraints [62]. In addition, it 
is known that symbolic constraints may give a compact representation of large 
search spaces, for example when constraints modulo AC are used [61] to encode 
a doubly exponential number of AC-unifiers [36] . 

It is conjectured in [56] that deduction with symbolic constraints will be a 
major research topic in efficient automated deduction and can result in a ma- 
jor breakthrough in efficiency. However, ten years after the first publication on 
deduction with symbolic constraint [37] there is still no implementation. The 
problem is that algorithms and datastructures for solving symbolic constraints 
are not yet developed, except for some particular operations. The most advanced 
results on symbolic constraints are in the area of solving constraints over sim- 
plification orderings. The first algorithm for solving RPO ordering constraints 
were described in [14,35], followed by a number of results on solving RPO order- 
ing constraints [55,15,54], but only recently an efficient algorithm was designed 
[58]. In the case of Knuth-Bendix ordering constraints, the decidability of con- 
straint solving was proved in 1999 [38,39], but no simple algorithms have yet 
been described. 

As shown in [94], good algorithms for solving ordering constraints are not 
enough for implementing constraint-based deduction. Efficient algorithms and 
datastructures should be developed also for constraint simplification and approx- 
imation. Moreover, first-order theories of ordering constraints are to be better 
understood. 
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8 Comparison of Algorithms and Datastructures 

One of the main problems in research on algorithms and datastructures for au- 
tomated deduction is that the algorithms are performed on terms and clauses, 
i.e., tree-like structures. It was observed that the worst-case complexity results 
for such structures can be inadequate in practice. For example, most provers use 
Robinson’s unification algorithm [77] having worst-case exponential complex- 
ity instead of efficient linear algorithms [65,46], since overhead on maintaining 
datastructures for these algorithms does not pay off in practice, where the ex- 
ponential behavior of Robinson’s algorithm does not show up. Subsumption is 
NP-complete [9,27], but the modern provers often make subsumption-checks of 
10® clauses against a database of lO"* clauses in a few seconds. 

A practical approach to comparing implementation of algorithms used in 
first-order automated deduction was recently undertaken in [57]. The essence of 
this approach is that the implementation techniques are compared on bench- 
marks taken from runs of theorem provers on real problems. It is likely that the 
methodology of [57] will be used for comparison of other important algorithms 
used in automated deduction. 

9 Other Aspects of Efficient Theorem Proving 

In this section we briefly overview aspects of efficient theorem proving other than 
those directly related to algorithms and datastructures. 



9.1 Non-resolution Theorem Proving 

Resolution is not the only automated reasoning method used in first-order au- 
tomated deduction. Model elimination [41] implemented in SETHEO [40] often 
performs very well on problems difficult for resolution-based provers. A recent 
adaptation of propositional splitting to the full first-order case [8] seems to be 
promising. However, non-resolution based procedures often have difficulties with 
equality and other built-in theories, as witnessed by the results overviewed in 
[21]. They have been several proposals on combination of paramodulation-based 
reasoning and tableau-based reasoning [18,19,52], but none of them was imple- 
mented. 



9.2 Parallel and Agent-Based Reasoning 

Parallel computing is becoming cheaper. Networks of computers are now readily 
available. This makes parallel and agent-based theorem proving attractive. There 
were early projects aiming at parallelizing theorem provers, both in the context 
of model elimination [80] and resolution [45]. Distributed theorem proving was 
considered e.g., in [53]. However, parallelization of theorem proving, especially 
resolution-based, requires further investigation and experiments. 




22 



A. Voronkov 



Theorem proving in which provers are considered as communicating agents 
was considered in [22] . It was shown that one can obtain a considerable speedup 
by running several provers in parallel and making them communicate by sending 
each other heuristically selected derived clauses. 

There are several projects which put together several provers in different 
forms, for example, proving a common interface for running them over the Web. 
Examples are MathWeb [24,23], MBASE [25], SystemOnTPTP [88]. However, 
the emphasis in such systems was so far on providing a graphical user interface or 
interactive theorem proving but not so much on efficient automated deduction. 

9.3 Other Research Directions 

There are many other aspects of efficient theorem proving not considered in this 
paper. We will mention some of them very briefly. 

In the future modern first-order theorem prover will be tightly integrated with 
other systems for automated reasoning, for example inductive theorem provers 
(see [12] for an overview), and proof assistants such as Isabelle [66] and HOL [28], 
and maybe model checkers. It is likely that first-order provers integrated in such 
systems will also partially implement proof-search specific to these systems, for 
example restricted forms of induction, proofs about inductively defined types, 
or even restricted forms of higher-order theorem proving. This will require new 
lines of research, for example 

— saturation-based higher-order theorem proving [10,11]; 

— intelligent work with definitions [67,20]; 

— built-in data types; 

— recognition of irrelevant axioms; 

~ propositional reasoning; 

~ built-in theories (not only equational) [83]; 

— reasoning with non-standard quantifiers, for example “there exists at most 
n”; 

— finite domain reasoning. 
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Abstract. In this paper the description logic ACCN'Hn+{'D)~ is in- 
troduced. Prominent language features beyond conjunction, full nega- 
tion, and quantifiers are number restrictions, role hierarchies, transitively 
closed roles, generalized concept inclusions, and concrete domains. As in 
other languages based on concrete domains (e.g. ACC{T>)) a so-called ex- 
istential predicate restriction is provided. However, compared to ACC{T>) 
only features and no feature chains are allowed in this operator. This re- 
sults in a limited expressivity w.r.t. concrete domains but is required to 
ensure the decidability of the language. We show that the results can be 
exploited for building practical description logic systems for solving e.g. 
configuration problems. 



1 Introduction 

In the field of knowledge representation, description logics (DLs) have been 
proven to be a sound basis for solving application problems. An application 
domain where DLs have been successfully applied is configuration (see [9] for an 
early publication). The main notions for domain modeling are concepts (unary 
predicates) and roles (binary predicates). Furthermore, a set of axioms (also 
called TBox) is used for modeling the terminology of an application. Knowledge 
about specific individuals and their interrelationships is modeled with a set of 
additional axioms (so-called ABox). 

Experiences with description logics in applications indicate that negation, 
existential and universal restrictions, transitive roles, role hierarchies, and num- 
ber restrictions are required to solve practical modeling problems without re- 
sorting to ad hoc extensions. A description logic which provides these language 
constructs is, for instance, ACCM'Hr+ [5]. The optimized DL knowledge rep- 
resentation system RACE [4] provides an optimized implementation for ABox 
reasoning in ACCM'Hfi + . With the optimized implementation of RACE, practi- 
cal systems based on description logics can be built. However, it is well-known 
that, in addition to the language constructs mentioned above, reasoning about 
objects from other domains (so-called concrete domains, e.g. for the reals) is very 
important for practical applications as well. In [I] the description logic ACC{T>) is 
investigated and it is shown that, provided a decision procedure for the concrete 
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domain V exists, the logic ACC{T>) is decidable. In this paper, an extension of 
the ACCM'Hr+ knowledge representation system RACE with concrete domains 
is investigated. 

Unfortunately, adding concrete domains (as proposed in the original ap- 
proach) to expressive description logics might lead to undecidable inference prob- 
lems. For instance, in [2] it is proven that the logic ACC{T>) plus an operator for 
the transitive closure of roles can be undecidable if expressive concrete domains 
are considered. ACCM'Hji+ offers transitive roles but no operator for the tran- 
sitive closure of roles. In [8] it is shown that ACC{'D) with generalized inclusion 
axioms (GCIs) can be undecidable. Even if GCIs were not allowed in ACCAfT-Ln+j 
ACCN'Hr+ with concrete domains would be undecidable (in general) because 
ACCM'Hr+ offers role hierarchies and transitive roles, which provide the same 
expressivity as GGIs. With role hierarchies it is possible to (implicitly) declare 
a universal role, which can be used in combination with a value restriction to 
achieve the same effect as with GGIs. Decidability results can only be obtained 
for “trivial” concrete domains, which are hardly useful in practical applications. 
Thus, if termination and soundness of, for instance, a concept consistency algo- 
rithm are to be retained, there is no way extending an ACCMT-LR+ DL system 
such as RAGE with concrete domains as in ACC {V) without losing completeness. 

Thus, ACCM'Hr+ can only be extended with concrete domain operators 
with limited expressivity. In order to support practical modeling requirements 
at least to some extent, we pursue a pragmatic approach by supporting only 
features (and no feature chains as in ACC{V), for details see [1] and below). The 
resulting language is called ACCN'Hr+{V)~ . By proving soundness and com- 
pleteness (and termination) of a tableaux calculus, the decidability of inference 
problems w.r.t. the language ACCN'Hr+{V)~ is proved. As shown in this pa- 
per, ACCN'Hr+{'D)~ can be used, for instance, as a basis for building practical 
application systems for solving configuration problems. 



2 The Description Logic ACCM'Hr+{T>) 

The description logic ACCM'Hr+ {'D)~ provides conjunction, full negation, quan- 
tifiers, number restrictions, role hierarchies, transitively closed roles and concrete 
domains. In addition to the operators known from ACCM'Hr+ , a restricted ex- 
istential predicate restriction operator for concrete domains is supported. Fur- 
thermore, we assume that the unique name assumption holds for the individuals 
explicitly mentioned in an ABox. 

We briefly introduce the syntax and semantics of the DL ACCN'Hr+{'D)^ . 
We assume five disjoint sets: a set of concept names G, a set of role names R, a set 
of feature names F , a set of individual names O and a set of names for (concrete) 
objects Oc- The mutually disjoint subsets P and T oi R denote non-transitive 
and transitive roles, respectively {R = P T). The language ACCN'Hr+ is in- 
troduced in Figure 1 using a standard Tarski-style semantics with an interpre- 
tation Xx> = (Z\x, L\x), -^) where Z\x H Z\x> = 0 holds. A variable assignment a 
maps concrete objects to values in Z\x>. 
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1 Syntax 


Semantics 


jConcepts (R ^ 7?, S G 5", f ^ F) 


A 


A^ C At 




At\C^ 


Cn D 


c^nD^ 


Cud 


C^UD^ 


3R.C 


{a G Ai 1 3 b G Ai : (a, b) G R^, b G C^} 


VR.C 


{a G At 1 V b G At : (a, b) G ^ b G C^} 


3>n S 


{a G Ai ||{b G At \ (a, b) G S^}|| > n} 


3<m S 


{a G Ai {b G At (a, b) G S^} < m} 


3fi,... ,f„.P 


{a G Ai 3xi, . . . ,Xn G At> ■ (a,xi) G fi^, . . . , (a,Xn) G 




(xi,... ,Xn) G P^} 


Vf.Ti, 


{a G Ai -i3xi G At, : (a,xi) G f^} 


[Roles and Features 


R 


C /\x X /\x 


f 


: Ax Ax> (features are partial functions) 



A is a concept name and || • || denotes the cardinality of a set (n, m € N, n > 0) . 



Axioms 


Syntax 


Satisfied if 


R G r 

R U S 
CUD 


R^ C 
C^C D^ 



1 Assertions (a,b G Oo,x,Xi G Oc) \ 


Syntax 


Satisfied if 


a:C 

(a,b):R 

(a,x):f 

(xi, ... ,Xn):P 


a^ G (A 
(a^, b^) G R^ 

(a^, a(x)) G f^ 
(a(xi),... ,a(xn)) G P^ 



Fig. 1. Syntax and Semantics of A£CAf'Hji+{T>) . 



If R, S C -R are role names, then R C S is called a role inclusion axiom. A 
role hierarchy TZ is & finite set of role inclusion axioms. Then, we define C* 
as the reflexive transitive closure of C over such a role hierarchy TZ. Given 
□*, the set of roles R'^ = {S G R | S C* R} defines the sub-roles of a role R and 
R^ = {S G R I R C* S} deflnes the super-roles of a role. We also deflne the set 
R := {R G R I R'^ n T = 0} of simple roles that are neither transitive nor have a 
transitive role as a sub-role. 

The concept language of ACCN'Hr+ is syntactically restricted with respect 
to the combination of number restrictions and transitive roles. Number restric- 
tions are only allowed for simple roles. This restriction is motivated by a known 
undecidability result in case of an unrestricted syntax [7]. The set of individuals 
is divided into two subsets, the set of so-called “old” individuals Oq and set 
the of “new” individuals On- Every individual name from O is mapped to a 
single element of Ax in a way such that for a, b G Oq, if a yf b (unique 

name assumption). Only old individuals may be mentioned in an ABox (new 
individual are generated by the completion rules introduced below). 

In accordance with [I] we also deflne the notion of a concrete domain. A 
concrete domain 27 is a pair {Ax-, where Ax is a set called the domain, and 
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is a set of predicate names. The interpretation function maps each predicate 
name P from with arity n to a subset of Concrete objects from Oc 
are mapped to an element of Z\x>. We assume that J_x) is the negation of the 
predicate Tx>. 

A concrete domain T> is called admissible iff the set of predicate names is 
closed under negation and <Pxi contains a name Tjy for Ajy, and the satisfiabil- 
ity problem PJ^(xn, . . . ,xinj A ... A P|j,'"(xmi, . . . ,Xmn„) is decidable (m is finite, 
P"' G Hi is the arity of P, and xjk is a concrete object). 

If C and D are concept terms, then C C D {generalized concept inclusion or 
GCI) is a terminological axiom. A finite set of terminological axioms 7n is called 
a terminology or TBox w.r.t. a given role hierarchy TZ. For brevity, the reference 
to TZ is omitted in the following. An ABox A is a finite set of assertional axioms 
as defined in Figure 1. 

An interpretation I is a model of a concept C (or satisfies a concept C) iff 
yf 0. An interpretation is a model of a TBox T iff it satisfies all axioms in 
T. See Figure 1 for the satisfiability conditions. An interpretation is a model of 
an ABox A w.r.t. a TBox iff it is a model of T and satisfies all assertions in 
A. Different individuals are mapped to different domain objects (unique name 
assumption). Note that features are interpreted differently from features in [1]. 

A concept C is called consistent (w.r.t. a TBox T) iff there exists a model 
of C (that is also a model of T). An ABox A is consistent (w.r.t. a TBox T) iff 
A has model 2 (which is also a model of 7"). A knowledge base (T,A) is called 
consistent iff there exists a model. 

3 Solving an Application Problem with ACCAf'HR+{V)~^ 

According to [3] configuration problem solving processes can be formalized as 
synthesis inference tasks. Following this approach, a solution of a configuration 
task is defined to be a (logical) model of the given knowledge base consisting 
of both the conceptual domain model (TBox) as well as the task specification 
(ABox). The TBox and the role hierarchy describe the configuration space. 

For instance, in a technical domain, the concept of a cylinder might be defined 
as follows. A Cylinder is required to be a Motorpart, to be part_of a Motor, to 
have a displacement of 1 to lOOOccm, and to have a set of 4 to 6 parts (role 
has_part) which are all instances of Cylinderpart and it consists of exactly 1 Piston, 
exactly 1 Piston_Rod, and 2 to 4 Valves. This expression can be transformed to a 
terminological inclusion axiom of a description logic providing concrete domains. 
Let the concrete domain 5ft be defined as in [1]: 5ft = (R, <P^) where is a set of 
predicates which are based on polynomial equations or inequations. The concrete 
domain 5ft is admissible (see also [1]). A TBox T is defined as follows: 



has_cylinder_part C has_part, 
has_piston_rod_part C has_part. 



has_piston_part C has_part 
has_valve_part C has_part 
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T C V has_cylinder_part . Cylinder, T Cl V has_piston_part . Piston 

T C V has_piston_rod_part . Piston_Rod, T C V has_valve_part . Valve 

In the first block, relationships between roles are declared. Then, in the sec- 
ond block, range restrictions for certain roles are imposed. Below, in the third 
block for Cylinderpart a so-called cover axiom is given. Moreover, additional ax- 
ioms ensure the disjointness of more specific subconcepts of Cylinderpart (D is a 
subconcept of C iff C subsumes D). 

Cylinderpart C Piston U Piston_Rod U Valve, Piston C ^Piston_Rod Cl ^Valve 
Piston.Rod C ^Piston Cl ^Valve, Valve C ^Piston Cl ^Piston.Rod 



The cylinder example is translated as follows (the term Avoi c. (. . . ) is a unary 
predicate of a numeric concrete domain for the dimension Volume with unit 
m^). 



Cylinder C Motorpart Cl part_of Cl 

3 displacement . Avoi c . (0.001 < c < 1) 13 
V has.part . Cylinderpart 3 
3>4 has_cylinder_part 3 3<g has_cylinder_part 3 
3^1 has_piston_part 3 3=j has_piston_rod_part 3 
3>g has_valve_part 3 3<^ has_valve_part 

We assume that displacement is declared as a feature. Furthermore, let 3=^ R be 
an abbreviation for 3>j R33<j R. In our example, the ABox being used is very 
simple: A= {a: Cylinder 3 3 displacement . Avoi c . (c > 0.5)}. 

In order to solve the problem to construct a Cylinder, the knowledge base (T, A) 
is tested for consistency. If the knowledge base is consistent, there exists a model 
which can be considered as a solution (see [3]). Note that iT,A) is only a very 
simplified example for a representation of a configuration problem. For instance, 
using an ABox with additional assertions it is possible to explicitly specify some 
required cylinder parts etc. In order to actually compute a solution to a configura- 
tion problem, a sound and complete calculus for the A£CJ\f'Hn+{'D)~ knowledge 
base consistency problem is required that terminates on any input. 

4 A Tableaux Calculus for ACCM1-Lr+(V)~ 

In the following a calculus to decide the consistency of an ACCM'Hn+{VA 
knowledge base (T, A) is devised. As a first step, the original ABox A of the 
knowledge base is transformed w.r.t. the TBox T ■ The idea is to derive an ABox 
Aj- that is consistent (w.r.t. an empty TBox) iff (T, A) is consistent. The calculus 
introduced below is applied to At- 

In order to define the transformation steps for deriving At, we have to in- 
troduce a few technical terms. First, for any concept term we define its negation 
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normal form. A concept is in negation normal form iff negation signs may occur 
only in front of concept names. 

Every ACCAfTLn+{'D)~ concept term C can be transformed into negation 
normal form nn/(C) by recursively applying the following transformation rules 
to subconcepts from left to right: 

“>(C n D) — —iC U ^D, “'(C U D) — >■ — 'C n ^D, —iV R . C — >■ 3 R . ~iC, 

-■3 R . C — > V R . -iC, -'-'C — >■ C, -'3>„ S — > 3<„_j^, -i3<„ S — >■ 3>m+j S, 

-■V f . J-j) — >■ 3 f . Tx), -i3 fi, . . . , fp . P — >■ 3 fi, . . . , fp . P U Vfi . J_x) U . . . U Vfp . J_x) 
where P is the negation of P. 

If no rule is applicable, the resulting concept is in negation normal form and 
all models of C are also models of nnf{C) and vice versa. The transformation is 
possible in linear time. 

Definition 1 (Additional ABox Assertions). Let C be a eoncept term, 
a.b G O be individual names, and x ^ 0\J Oc, then the following expressions 
are also assertional axioms: \/x .x:C (universal concept assertion)/ a / b (in- 
equality assertion). 

An interpretation Xj) satisfies an assertional axiom 'i x .x:Q iffC^ = Ax and 
a / b ZjO^a^ / b^. 

Definition 2 (Fork, Fork Elimination). If it holds that 
{(a,xi) :f, (a,X 2 ) A} C A then there exists a fork in A. In ease of a fork w.r.t. 
Xi,X 2 , the replacement of every occurrence o/x 2 in A by xi is called fork elimi- 
nation. 

Definition 3 (Augmented ABox). For an initial ABox A we define its aug- 
mented ABox Aj- w.r.t a TBox T by applying the following transformation rules 
to A. First of all, all forks in A are eliminated (note that the unique name 
assumption is not imposed on concrete objects). Then, for every GCI C C D 
in T the assertion 'i x .x \ (-•C U D) is added to A. Every concept term occur- 
ring in A is transformed into its negation normal form. Let 0 _a = {ai, . . . ,ap} 
be the set of individuals mentioned in A, then the set of inequality assertions 
{ai / aj I a;, aj G 0_a, i,j G l..n, i / j} is added to A. 

In order to check the consistency of an AIXAf'Hn+{'D)~ knowledge base 
(T, A), the augmented ABox Ar is computed. Then, a set of so-called completion 
rules (see below) is applied to the augmented ABox Ar- The rules are applied 
in accordance with a completion strategy. 

Lemma 1. A knowledge base (T, A) is consistent if and only if Aj- is consistent 
(w.r.t. an empty TBox). 

The proof is straightforward, for details see [6] . 

The tableaux rules require the notion of blocking their applicability. This is 
based on so-called concept sets, an ordering for new individuals and concrete 
objects, and the notion of a blocking individual. 

^ V a: . X : C is to be read as Vx . (x:C). 
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Definition 4 (Ordering). We define an individual ordering for new indi- 
viduals ( elements of On ) occurring in an ABox A. If h & On is introduced in A, 
then a ^ b for all new individuals a already present in A. A concrete object or- 
dering Ac ’for elements of Oc occurring in an ABox A is defined as follows. If 
y € Oc is introduced in A, then x^c y for all concrete objects x already present 
in A. 



Definition 5 (Concept Set, Blocking Individual, Blocked by). Given 
an ABox A and an individual a occurring in A, we define the concept set of 
a as cr(.4, a) := {C | a:C G .4}. Let A be an ABox and a,b G On be individuals 
in A. We call a the blocking individual of b if the following conditions hold: 
cr(.4, a) D ct(.4, b) and a -< b. //a is o blocking individual for h, then b is said to 
be blocked by a. An individual b mentioned in an ABox A is said to be blocked 
(in A) iff there exists a blocking individual for b in A. 

4.1 Completion Rules 

We are now ready to define the completion rules that are intended to generate 
a so-called completion (see also below) of an ABox A-p. From this point on, if 
we refer to an ABox A, we always consider ABoxes derived from Aq-. 

Definition 6 (Completion Rules). 

iin The conjunction rule, 
ifl. a : C n D G A, and 
2. {a:C, a:D} g A 
then A' = AO {a :C, a : D} 

RU The disjunction rule (nondeterministic) . 
ifl. a : C U D G A, and 
2. {a : C, a : D} n A = 0 
then A' = AU {a :C} or A' = AU {a:D} 

RiC The role value restriction rule, 
ifl. a :V R . C G A, and 

2. 3 b G O, S G : (a, b) : S G A, and 

3. b:C ^ A 

then A' = AU {h:C} 

i?V+ C The transitive role value restriction rule, 
ifl. a :V R . C G A, and 

2. 3 b G O, T G R'^, T G T, S G : (a, b) : S G A, and 

3. h-.'iJ.Q^A 

then A' = A U {b:VT . C} 

RAx The universal concept restriction rule, 
ifl. V X .x'.Q. & A, and 

2. 3a G O; a mentioned in A, and 

3. a:C^ A 

then A' = ALI {a :C} 
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R3C The role exists restriction rule (generating) . 
ifl. a :3 R . C G and 

2. a is not blocked, and 

3. -3bG 0,SGR^:{(a,b):S, b:C}C^ 

then ^ U {(a, b) : R, b : C} where b G Om is not used in A 

R3>n The number restriction exists rule (generating) . 
ifl. a:3>„RGA and 

2. a is not blocked, and 

3. -3bi,... ,b, G Oa,,Si,... ,Sn G R'^ : 

{(a,bk) :Sk I k G l..n} U {b; ^ bj | i,j G l..n, i 7 ^ j} C ^ 
then A' = Ayj {(a, bk) : R | k G l..n} U {b; ^ bj | i, j G l..n, i 7 ^ j} 
where bi, . . . , bn G Of] are not used in A 

R3<n The number restriction merge rule (nondeterministic) . 
ifl. a:3<„RGA and 

2. 3bi,... ,bn, G 0,Si,... ,Sm G R'^; {(a,bi):Si,... ,(a,bn,):Sm} C A 

with m > n, and 

3. 3 bj, bj G {bi, . . . , bm} : i 7 ^ j, bj 7 ^ bj ^ ^ 

then A' = ^[bj/bj], i.e. replace every occurrence ofh\ in A by bj 

R3P The predicate exists rule (generating) . 
ifl. a : 3 fi, . . . , f„ . P G .4, and 

2. -.3X1,... ,Xn G Oc : {(a,Xi) :fi, . . . (a,Xn) :fn, (xi, . . . ,Xn):P} C A 
then A' = ^ U {(a,Xi) :fi, . . . (a,Xn) :fn, (xi, . . . ,Xp) :P} 
where xi, . . . ,Xp G Oc are not used in A, 
eliminate all forks {(a,x) :fj, (a,Xj) :fj} C A 
such that (a,x):fj remains in A z/x^(iXj,i G l..n 

We call the rules RU and R3<„ nondeterministic rules since they can be 
applied in different ways to the same ABox. The remaining rules are called 
deterministic rules. Moreover, we call the rules R3C, R3>„ and R3P generating 
rules since they can introduce new individuals or concrete objects. 

Given an ABox A, more than one rule might be applicable to A. This is 
controlled by a completion strategy in accordance to the ordering for new indi- 
viduals (see Definition 4). 

Definition 7 (Completion Strategy). We define a completion strategy that 
must observe the following restrictions: 

— Meta rules: 

• Apply a rule to an individual b G On only if no rule is applicable to an 
individual a G Oq • 

• Apply a rule to an individual b G On only if no rule is applicable to 
another individual a G On such that a ^ b. 

— The completion rules are always applied in the following order. A step is 
skipped in case the corresponding set of applicable rules is empty. 

1. Apply all nongenerating rules (RO, RU, PNC, PN+C, PNx, R3<n) as 
long as possible. 
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2. Apply a generating rule (R3C, -R3>„, RAP) and restart with step 1 as 
long as possible. 

In the following we always assume that rules are applied in accordance to 
this strategy. It ensures that the rules are applied to new individuals w.r.t. the 
ordering which guarantees a breadth- first order. No rules are applied if a 
so-called clash is discovered. 

Definition 8 (Clash, Clash Triggers, Completion). We assume the same 
naming conventions as used above. An ABox A contains a clash if one of the 
following clash triggers is applicable. If none of the clash triggers is applicable 
to A, then A is called clash-free. 

— Primitive clash.- {a : C, a : -iC} C A 

— Number restriction merging clash.- 

3Si,... ,Sn, G : {a:3<„R}U{(a,bi):Si|i G l..m}U 
{bj ^ bj I i, j G l..m, i yf j} C .A with m > n 

— No concrete domain feature clash.- {(a,x) :f , a :Vf . J_x)} C A. 

— Concrete domain predicate clash.- . . . ,Xn,^) : Pi G .4, . . . , 

. . . ,Xn|)^) :Pk G A and the conjunction A-[Li Pi(^i^ - • ■ satis- 

fiable in T>. Note that this can be decided since T> is required to be admissible. 

A clash-free ABox A is called complete if no completion rule is applicable to A. 
A complete ABox A' derived from an ABox A is also called a completion of A. 

Any ABox containing a clash is obviously unsatisfiable. The purpose of the 
calculus is to generate a completion for an initial ABox Aj- that proves the 
consistency of Aj- or its inconsistency if no completion can be found. 

4.2 Decidability of the ACCAf'Hji+{T>)^ ABox Consistency Problem 

In order to show that the calculus introduced above is correct, first the local 
correctness of the rules is proven. 

Proposition 1 (Invariance). Let A and A' be ABoxes. Then: 

1. If A' is derived from A by applying a deterministic rule, then A is consistent 
iff A' is consistent. 

2. If A' is derived from A by applying a nondeterministic rule, then A is consis- 
tent if A' is consistent. Conversely, if A is consistent and a nondeterministic 
rule is applicable to A, then it can be applied in such a way that it yields an 
ABox A' which is consistent. 

Proof. 1. Due to the structure of the deterministic rules one can immedi- 
ately verify that A is a subset of A'. Therefore, A is consistent if A' is consistent. 

“=>” In order to show that A' is consistent after applying a deterministic 
rule to the consistent ABox A, we examine each applicable rule separately. We 
assume that Tp = {Ax, Aj), A) satisfies A. Then, by definition of C* it holds 
that R^ C if (R,S) G 




38 



V. Haarslev, R. Moller, and M. Wessel 



If the conjunction rule is applied to a : C □ D G then we get a new Abox 
A' = AU {a:C, a : D}. Since I-v satisfies a : C □ D, 2x> satisfies a : C and a : D and 
therefore A'. 

If the role value restriction rule is applied to a : V R . C G A, then there must 
be a role assertion (a, b) : S G A with S G and A' = A U {b : C}. 2x> satisfies A, 
hence it holds that (a^, b^) G S^, C R^. Since Xx> satisfies a : V R . C, b^ G 
must hold. Thus, Tp satisfies b: C and therefore A'. 

If the transitive role value restriction rule is applied to a : V R . C G A, there 
must be an assertion (a, b) :S G A with S G for some T G T and T G R'^ such 
that we get A' = A U {b : VT . C}. Since 2x> satisfies A, we have a^ G (V R . C)^ 
and (a^,b^) G S^,S^ C C R^. It holds that b^ G (VT. C)^ unless there is 
some 2 G Ax with (b^, z) G and 2 ^ C^. Since T is transitive, (a^, z) G 
and a^ ^ (V R . C)^ in contradiction to the assumption that X satisfies A. Hence, 
X must satisfy b:VT . C and therefore Xx> is a model for A'. 

If the universal concept restriction rule is applied to an individual a in A 
because of Vx .x:C € A, then A' = AU {a: C}. Since Xx satisfies A, it holds 
that = Ax- Thus, it holds that a^ G and Xp satisfies A'. 

If the role exists restriction rule is applied to a : 3 R . C G A, then we get the 
ABox A' = A U {(a, b) : R, b : C}. Since Zp satisfies A, there exists a, y € Ax such 
that (a^, y) G R^ and y G C^. We define the interpretation function A such that 
b^ := y and x^ := x^ for x Hence, Xp = {Ax, Ax, A ) satisfies A'. 

If the number restriction exists rule is applied to a:3>„ R G A, then we get 
A' = Afi {(a, bk) : R I k G l..n} U {bi ^ bj | i,J G l..n, i yf J}. Since Xp satisfies A, 
there must exist n distinct individuals yi G Ax, i G l..n such that (a^, yi) G R^. 
We define the interpretation function A such that b;^ := yi and x^ := x^ for 
X ^ {bi, . . . , bn}. Hence, Xfp = (Ax, Ax, A ) satisfies A'. 

If the predicate exists rule is applied to a : 3 fi, . . . , . P G A, then we get the 

ABox A' = AU {(xi, ... ,Xn) : P, (a,xi) :fi, . . . , (a,Xn) :fn|. After fork elimination, 
some X; may be replaced by z\ with zi^cx;. Since Xp satisfies A, there exist 
yi, ■ ■ ■ ,yn & Ax such that Vz G {1, . . . ,n} : (a^, y,) G f-^ and {yi,--. ,y„) e P^. 
We define the interpretation function A such that := yi for all Xi not replaced 
by Zi and (yi, . . . , G P^ . The fork elimination strategy used in the R3P rule 
guarantees that concrete objects introduced in previous steps are not eliminated. 
Thus, it is ensured that the interpretation of Xi is not changed in X!p . It is easy 
to see that X!p = (Ax, Ax, A ) satisfies A' . 

2. “<^=” Assume that A' is satisfied by X}, = (Ax, Ax, A ). By examining the 
nondeterministic rules we show that A is also consistent. 

If A' is obtained from A by applying the disjunction rule, then A is a subset 
of A' and therefore satisfied by X},. 

If A! is obtained from A by applying the number restriction merge rule to 
a : 3<„ R G A, then there exist bj, bj in A such that A' = A[bj/bj]. We define the 
interpretation function A such that b-^ := b-^ and x^ := x^ for every x yf b;. 
Obviously, Xp = (Ax, Ax, A) satisfies A. 

“=>” We suppose that Xp = (Ax, Ax, A) satisfies A and a nondeterministic 
rule is applicable to an individual a in A. 
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If the disjunction rule is applicable to a:C U D G ^ and A is consistent, it 
holds G (C U D)^. It follows that either dA G (A or dA G (or both). Hence, 
the disjunction rule can be applied in a way that Xp also satisfies the ABox A' . 

If the number restriction merge rule is applicable to a:3<„R G A and A 
is consistent, it holds sA G (3<„ R)^ and ||{b | (a^, b^) G R^}\\ < n. However, it 
also holds ||{b | (a^, b^) G R^}|| > m with m > n. Without loss of generality we 
only need to consider the case that to = n + 1. Thus, we can conclude by the 
Pigeonhole Principle that there exist at least two R-successors bj , bj of a such 
that bi^ = bj^. Since Ip satisfies A, it must have been possible to map bi and 
bj to the same domain object, i.e. at least one of the two individuals must be a 
new individual. Let us assume b; G Ojv, then Ip obviously satisfies A[bj/bj]. 

In order to define a canonical interpretation from a completion A, the notion 
of a specific blocking individual is introduced. We call a the witness of b iff b is 
blocked by a and -i3 c in A : c G On,c -< a, a(A, c) D a{A, b). The witness for a 
blocked individual is unique (see [6]). Note that the canonical interpretation is 
constructed differently from the one describe in [7]. 

Definition 9. Let A he a complete ABox that has been derived by the calcu- 
lus from an augmented ABox A-p- Since A is clash-free, there exists a vari- 
able assignment a that satisfies (the conjunction of) all occurring assertions 
(xi, ... ,Xn) : P G A. We define the canonical interpretation Zq = (Liig, L\p, 
w.r.t. A as follows: 

1. Ax^ := {a I a is mentioned in A} 

2. := a iff Si is mentioned in A 

3. yA'^ := a(x) iffy is mentioned in A 

4- a € AA‘^ ijf a :A G A and A is a concept name 

5. (a,a(x))GFc iff {a, y) A G A 

6. (a, b) G R^“^ iff 3 cq, .. . , Cn, do, . . . , dn_i mentioned in A 

a) n > l,co = a,Cn = b, and 

b) (a,Ci):Si, (di,C2):S2, . . . (dn_2, Cn-i) : Sn_i, (dn_i, b) : Sp G A, and 

c) Vi G l..n — 1 : 

di = Ci or 

d; is a witness for Cj, and (dj, Ci+i) :Si+i G A, and 

d) z/n > 1 

Vi G l..n : 3R' G I, R' G R^, Si G R''^ 
else 

Si G R^. 

The construction of the canonical interpretation for the case 6 is illustrated with 
an example in Figure 2. The following cases can be seen as special cases of case 
6 introduced above (n = 1, co = a, ci = b): 

^ Note that the variables co, . . . , Cn, do, . . . , d„-i not necessarily denote different indi- 
vidual names. 
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Fig. 2. Construction of the canonical interpretation. In the lower example we assume 
that the individual d2 is a witness for c2 (see text). 



— Co = do : (a, b) S iff (co,ci) : Si £ .4 for a role Si G R'^. 

- Co 7^ do: (a, b) G iff do is a witness for co , and 

(do, Cl) : Si G .4, for a role Si G R'^. 

Since the witness of an individual is unique, the canonical interpretation is 
well-defined because there exists a unique blocking individual (witness) for each 
individual that is blocked. 

Lemma 2 (Soundness). Let A he a eomplete ABox that has been derived hy 
the ealeulus from an augmented ABox At, then At has a model. 

Proof. Let Ic = (L\xc, L\-p, be the canonical interpretation for the ABox A 
constructed w.r.t. the TBox T. A is clash-free. 

Features are interpreted in the correct way: There can be no forks in A 
because (i) there are no forks in the augmented ABox At and (ii) forks are 
immediately eliminated after an application of the R3P rule. This rule is the 
only rule that introduces new assertions of the form (a,x):f G A. Note that 
forks cannot be introduced by the R3<„ rule due to the completion strategy. 
Thus, Ic maps features to (partial) functions because the variable assignment a 
is a function. 

All role inclusions in the role hierarchy are satisfied: For every S C R it holds 
that 5^“^ C R^c xhis can be shown as follows. If (a^'^, b^'^) G 5^“^, case 6 of Def- 
inition 9 must be applicable. Hence, there exists a chain of sub-roles possibly 
with gaps and witnesses (see Definition 9, case 6). Thus, the corresponding con- 
struction for Ic adding {a ^'^ , b^*^ ) to is also applicable to R since S G R'^ (see 
6d). Therefore, there is also a tuple (a^^,b^‘^) G . 

All (implicit) transitivity axioms are satisfied, i.e. transitive roles are inter- 
preted in the correct way: V R G T : R^^ = (R^'^)"'". If there exist (a^<^ , b^^) G 
and (b^‘^,c^‘^) G R^‘^ then case 6 in Definition 9 must have been applied for each 
tuple. But then, a chain of roles from a to c exists as well (possibly with gaps 
and witnesses) such that (a^‘^,c^‘^) is added to R^“^ as well. 

In the following we prove that Ic satisfies every assertion in A. 

For any a b G A or (a, b) : R G A, Ic satisfies them by definition. 
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For any (a,x):f G A, Xq satisfies them by definition. 

For any (xi, . . . & A,Xq satisfies them by definition. Since A is clash- 

free there exists a variable assignment such that the conjunction of all predicate 
assertions is satisfied. The variable assignment can be computed because the 
concrete domain is required to be admissible. 

Next we consider assertions of the form a:C. We show by induction on the 
structure of concepts that a : C G ^ implies a^^ G . 

If C is a concept name, then a^'^ G by definition of Xq ■ 

If C = then D is a concept name since all concepts are in negation normal 
form (see Definition 3). ^ is clash-free and cannot contain a : D. Thus, a^“^ ^ 
i.e. G Ziic \ Hence a^*^ G 

If C = Cl n C 2 then (since A is complete) a : Ci & A and a : C 2 G By induc- 
tion hypothesis, a^‘^ G and a^<^ G C 2 ^‘^. Hence G (Ci □ Q 2 )^‘^- 

If C = Cl U C 2 then (since A is complete) either a : Ci G ^ or a : C 2 G .4. By 
induction hypothesis, a^^ G Ci^'^ or a^<^ G . Hence G (Ci U C 2 )^^. 

If C = V R . D, then it must be shown that for all with {sA '^ , b^"^) G 

it holds that G If (a^‘^,b^‘^) G R^“^, then according to Definition 9, b 
is a successor of a via a chain of roles Si G R'^ or there exists corresponding 
witnesses as domain elements of S| G R'^, i.e. the chain might contain “gaps” with 
associated witnesses (see Figure 2). Since (a^^^, b^^^) G R^'^ and Si^'^ c there 
exists tuples (Ci^SCi + i^c) g 

Si^*^. Due to Definition 9 it holds that Vi G l..n : 
3R' G T, R' G R'^, Si G R''^. Therefore Ck:VR' . D G A, (k G l..n — 1) because A is 
complete. For the same reason b : D G Vl. By induction hypothesis it holds that 
^A'^ G As mentioned before, the chain of roles can have one or more “gaps” 
(see Figure 2). However, due to Definition 9 in case of a “gap” there exists a 
witness such that a similar argument as in case 6 can be applied, i.e. in case of 
a gap between Ci and Ci+i with witness di for Ci, the blocking condition ensures 
that the concept set of the witness is a superset of the concept set of the blocked 
individual. Since it is assumed that (di, Ci+i) :Si+i G A and A is complete it holds 
that Cigi : V R' . D G A. Applying the same argument inductively, we can conclude 
that Cn-i : V R' . D G A and again, we have b^"^ G by induction hypothesis. 

If C = 3 R . D, then it must be shown that there exists an individual G Ax^, 
with (a^"^, b^‘^) G R^“^ and b^“^ G Since ABox A is complete, we have either 
(a, b) :S G A with S G R'^ and b: D G A or a is blocked by an individual c and 
(c, b) :S G A (again S G R'^). In the first case we have (a^<^, b^‘^) G R^*^ by the 
definition of Xc (case 6, n = l,c; = di) and b^<^ G by induction hypothesis. 
In the second case there exists the witness c with c : 3 R . D G A. By definition c 
cannot be blocked, and by hypothesis A is complete. So we have an individual 
b with (c, b):S G A (S G R'^) and b:D G A. By induction hypothesis we have 
b^^^ G and by the definition of Xc (case 6, n = l,Ci yf d|, d| is a witness for 
Ci, and a = Ci,c = di) we have (a^‘^,b^‘^) G R^“^. 

If C = 3>„ R, we prove the hypothesis by contradiction. We assume that 
sA'^ A- (3>„ R)^*^. Then there exist at most m (0 < m < n) distinct S-successors 
of a with S G R'^. Two cases can occur: (I) the individual a is not blocked in Xc- 
Then we have less than n S-successors of a in A, and the R3>„-rule is applicable 
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to a. This contradicts the assumption that A is complete. (2) a is blocked by an 
individual c but the same argument as in case (1) holds and leads to the same 
contradiction. 

For C = 3<„ R we show the goal by contradiction. Suppose that 
a^*^ ^ R)^'^ . Then there exist at least n + 1 distinct individuals 

such that i G l..n + 1. The following two 

cases can occur. (1) The individual a is not blocked: We have n+l (a, b^) : Si G A 
with Si G R'^ and Si ^ T, z G l..n+ 1. The R3<„ rule cannot be applicable since 
A is complete and the bi are distinct, i.e. bi ^ bj G A, i,j G l..n+ 1, z yf j. This 
contradicts the assumption that A is clash-free. (2) There exists a witness c for a 
with (c, bi) :Si G A, Si G R'^, and Si ^ T, z G l..n+l. This leads to an analogous 
contradiction. Due to the construction of the canonical interpretation in case of 
a blocking condition (with c being the witness) and a non-transitive role R (R 
is required to be a simple role, see the syntactic restrictions for number restric- 
tions and role boxes), there is no (a^'^ , bk^'^) G R^*^ if there is no (c ^‘^ , bk^‘^) G R^*^ 
(k G l..n -I- 1). 

If C = 3 fi , . . . , fn . P we show that there exist concrete objects z/i , . . . , z/n G 
Av such that G . . . , (a^^, z/„) G and (z/i, . . . ,y„) gP^^. The 

R3P rule generates assertions (a,xi) :fi, . . . , (a,Xn) :fn, (xi, . . . ,Xn):P. Since A 
is clash-free there is no concrete domain clash. Hence there exists a variable 
assignment a that maps xi, . . . ,Xn to elements of Z\x>. The conjunction of con- 
crete domain predicates is satisfiable and , • • • , ) G . By definition 

of Ic it holds that (a^^,Xi^‘^) G . . . , (a^^,Xn^‘^) G Thus, there exist 
j/i , . . . ,y„ such that the above-mentioned requirements are fulfilled and there- 
fore G (3 fi, . . . , fn . P)^‘^ 

If C = Vf . Tx) then we show that a^*^ G (Vf . ■ Because A is clash-free, 

there cannot be an assertion (a,x) :f G ^ for some x in Oc and an f G F. Thus, 
it does not hold that there exists (a^^,y) G and hence a^^ G (Vf .T-p)^'^. 

If V a; . a; : D G Vl, then -due to the completeness of A- for each individual a 
in A we have a : D G Vl and, by the previous cases, a^‘^ G . Thus, Ic satisfies 
V a; . a; : D. Finally, since Ic satisfies all assertions in A, Ic satisfies A. 



Lemma 3 (Completeness). Let Aj- be an augmented ABox be a role box. If 
At is consistent, then there exists at least one completion A' being computed by 
applying the completion rules. 

Proof. By contraposition: Obviously, an ABox containing a clash is inconsistent. 
If there does not exists a completion of At, then it follows from Proposition 1 
that the ABox At is inconsistent. 



Lemma 4 (Termination). The calculus described above terminates on every 
(augmented) input ABox. 

Proof. The termination of the calculus is shown by specifying an upper limit 
on the number of assertions that can result from an (augmented) input ABox 
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of a certain length n. Compared to ACCM'Hn+ in the termination proof for 
ACCN'Hb.+ {V)~ the additional constructs for concrete domains have to be con- 
sidered. Basically, since features do not “interact” with value and number restric- 
tions (see the completion rules), the same upper limit 0(2^”) for a completion 
can be derived. For details see [6]. 



Theorem 1 (Decidability). Let T> he an admissible concrete domain. Check- 
ing whether an ACCM'Hn+{'D)~ knowledge base {T,A) is consistent is a decid- 
able problem. 

Proof. Given a knowledge base (T,A), an augmented ABox Ar can be con- 
structed in linear time. The claim follows from Lemmas 1, 2, 3, and 4. 

5 Conclusion 

We presented a tableaux calculus deciding the knowledge base consistency prob- 
lem for the description logic ACCAf'H}i+{'D)~ . Applications of the logic in the 
context of configuration problems have been sketched. The Cylinder example 
demonstrates that some requirements of a model-based configuration system are 
fulfilled by ACCAf'H]i+{'D)~ . The calculus presented in this paper can be used 
to solve “simple” configuration problems in which the configuration space can be 
described by an ACCN'Hji+{'D)~ knowledge base (see [6] for an analysis of the 
models resulting from the canonical interpretation). We conjecture that concrete 
domains without features chains can also be included in description logics with 
inverse roles and qualified number restrictions. 

A highly optimized variant of the calculus for the sublogic ACCN'Hfi+ is 
already implemented in the ABox description logic system RACE. RACE is 
available at http://kogs-www.informatik.uni-hamburg.de/~race/. RACE will be 
extended with support for reasoning with concrete domains in the near future. 
With this paper we provide a sound basis for practical extensions of expressive 
DL systems such that, for instance, construction problems can be effectively 
solved with description logic reasoning techniques. 
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Abstract. Concrete domains are an extension of Description Logics 
(DLs) allowing to integrate reasoning about conceptual knowledge with 
reasoning about “concrete properties” of objects such as sizes, weights, 
and durations. It is known that reasoning with ACC{T>), the basic DL 
admitting concrete domains, is PSPACE-complete. In this paper, it is 
shown that the upper bound is not robust: we give three examples for 
seemingly harmless extensions of ACC{V ) — namely acyclic TBoxes, in- 
verse roles, and a role-forming concrete domain constructor — that make 
reasoning NExpTiME-hard. As a corresponding upper bound, we show 
that reasoning with all three extensions together is in NExpTime. 



1 Introduction 

Description Logics (DLs) are a family of logical formalisms for the representation 
of and reasoning about conceptual knowledge. The knowledge is represented on 
an abstract logical level, i.e., by means of concepts (unary predicates), roles (bi- 
nary predicates), and logical constructors. This makes it difficult to adequately 
represent knowledge concerning “concrete properties” of real-world entities such 
as their sizes, weights, and durations. Since, for many knowledge representation 
applications, it is essential to integrate reasoning about such concrete properties 
with reasoning about knowledge represented on an abstract logical level, Baader 
and Hanschke extended Description Logics by so-called concrete domains [1]. 
A concrete domain consists of a set called the domain and a set of predicates 
with a fixed interpretation over this domain. For example, one could use the real 
numbers as the domain and then define predicates such as the unary = 23 , the 
binary “<” and “=”, and the ternary “-I-” and [1]. Or one could use the set 
of all intervals over, say, the rationals as the domain and then define “temporal” 
predicates such as during, meets, and before [15]. Baader and Hanschke propose 
to extend the basic Description Logic ACC with concrete domains which yields 
the logic ACC{V). The interface between ACC and the concrete domain is pro- 
vided by a concrete domain concept constructor. To illustrate the use of concrete 
domains for knowledge representation, consider the example A/lC(2?)-concept 

\/ subprocess. Drilling r\ 3workpiece.{3height.=^cm H 3height, length.>) 
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which describes a process all of whose subprocesses are drilling processes and 
which involves a workpiece with height 5cm and hight strictly greater than its 
length. Here, = 5 cm is a unary predicate from the concrete domain and > is a 
binary predicate. The subconcept in brackets is a conjunction of two concrete 
domain concept constructors. Other DLs with concrete domains can be found in 
[3,8,12], while applications of such logics are described in [2,8]. 

In this paper, we are interested in the complexity of reasoning with Descrip- 
tion Logics providing for concrete domains. The complexity of ACC{T>) itself 
is determined in [14], where reasoning with ACC{T>) is proved to be PSpace- 
complete if reasoning with the concrete domain T> is in PSpace. However, for 
many applications, the expressivity of ACC{T>) is not sufficient which makes it 
quite natural to consider extensions of this logic with additional means of ex- 
pressivity. We consider three such extensions — all of them frequently used in 
the area of Description Logics — and show that, although all these extensions are 
seemingly “harmless”, reasoning in the extended logics is considerably harder 
than in ACC{V) itself. Hence, the PSpace upper bound of ACC{V) cannot be 
considered robust. 

More precisely, we consider the extension of ACC{T>) with (1) acyclic TBoxes, 
(2) inverse roles, and (3) a role-forming concrete domain constructor. TBoxes are 
used for representing terminological knowledge and background knowledge of ap- 
plication domains [5,13], inverse roles are present in most expressive Description 
Logics [5,10], and the role-forming constructor is a natural counterpart to the 
concept-forming concrete domain constructor [8]. By introducing a NExpTime- 
complete variant of the Post Correspondence Problem [17,9], we identify a large 
class of concrete domains V such that reasoning with each of the above three 
extensions of ACC{T>) (separately) is NExpTiME-hard. This dramatic increase 
in complexity is rather surprising since, from a computational point of view, all 
of the proposed extensions look harmless. For example, in [13], it is shown that 
the extension of many PSpace Description Logics with acyclic TBoxes does 
not increase the complexity of reasoning. Moreover, it is well-known that the 
extension with inverse roles does usually not change the complexity class. For 
example, ALC extended with inverse roles is still in PSpace [11]. As a corre- 
sponding upper bound, we show that, if reasoning with a concrete domain T> is 
in NP, then reasoning with ACC{T>) and all three above extensions (simultane- 
ously) is in NExpTime. We argue that this upper bound captures a large class 
of interesting concrete domains. This paper is accompanied by a technical report 
containing full proofs [16]. 



2 Description Logics with Concrete Domains 



We introduce the Description Logics we are concerned with in the remainder of 
this paper. First, ACCX{T>) is defined which extends ACC{T>) with inverse roles. 
In a second step, we add a role-forming concrete domain constructor and obtain 
the logic ACC'R,VX{V) . This two-step approach is pursued since the definition of 
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ACCTZ'PI{'D) involves some rather unusual syntactic restrictions which we like 
to keep separated from the more straightforward syntax of ACCX{T>). 

Definition 1 (Concrete Domain). A concrete domain T> is a pair {Axi,^-d), 
where Ajj is a set called the domain, and is a set of predicate names. Each 
predicate name P € d>xi is associated with an arity n and an n-ary predicate 
P'^ C Atf,. 

With P, we denote the negation of the predicate P, i.e. P = Ajy \ P^ ■ Based 
on concrete domains, we introduce the syntax of ACCI(T>). 

Definition 2 (Syntax). Let Nq, Nj^, and N^f be mutually disjoint sets of 
concept names, role names, and concrete feature names, respectively, and let 
NaF be a subset of Nfc- Elements of NaF are called abstract features. The set 
of ACCI{'D) roles Nf is Nf U | R G Nf}- An expression /i • • • fn9, where 
fi, ■ ■ ■ , fn G NaF (n > 0) and g G Ncf, is called a path. The set of ACCI{'D)- 
concepts is the smallest set such that 

1. every concept name is a concept 

2. if C and D are concepts, R is a role, g is a concrete feature, P G is 
a predicate name with arity n, and Ui, ... ,Un are paths, then the following 
expressions are also concepts: ~<C, Cr\D, CUD, 3R.C, \/R.C, 3u \, . . . , u„.P, 
and gf. 

An A£CT{T>)-concept which uses only roles from Nf is called an ACC{T>)- 
concept. With sub{C), we denote the set of subconcepts of a concept C which 
is defined in the obvious way. Throughout this paper, we denote concept names 
with A and B, concepts with C and D, roles with R, abstract features with /, 
concrete features with g, paths with u, and predicates with P. As usual, we write 
T for A U -lA, _L for A □ -lA (where A is some concept name), and 3/i • • • fn.C 
(resp. V/i • • • fn.C) for 3/i. • • • 3/„.C (resp. V/i. • • • V/„.C). 

The syntactical part of a Description Logic is usually given by a concept lan- 
guage and a so-called TBox formalism. The TBox formalism is used to represent 
terminological knowledge of the application domain. 

Definition 3 (TBoxes). Let A be a concept name and C be a concept. Then 
A X C is a concept definition. Let T be a finite set of concept definitions. 
A concept name A directly uses a concept name B in T if there is a concept 
definition A = C in T such that B appears in C. Let uses be the transitive 
closure of “directly uses”. T is called acyclic if there is no concept name A such 
that A uses itself in T. If T is acyclic, and the left-hand sides of all concept 
definitions in T are unique, then T is called a TBox. 

TBoxes can be thought of as sets of macro definitions, i.e., the left-hand side of 
every concept definition is an abbreviation for the right-hand side of the concept 
definition. There also exist more general TBox formalisms allowing for arbitrary 
equations over concepts [5,10]. However, we will see that admitting these general 
TBoxes makes reasoning with ACC{T>) (and hence also ACCX{T>)) undecidable. 
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Definition 4 (Semantics). An interpretation X is a pair (Zixr^); where Ax 
is a set called the domain and the interpretation function. The interpretation 
function maps each concept name C to a subset of Ax, each role name R to 
a subset of Ax x Ax, each abstract feature f to a partial function f^ from 
Ax to Ax, and each concrete feature g to a partial function from Ax to Ax- 
If u = fi ■ ■ ■ fng is a path, then vf'{a) is defined as g^{fn ■ ■ • ifi{o,)) ' ' ')■ The 
interpretation function is extended to arbitrary roles and concepts as follows: 

{R-f :={{a,b) I {b,a)eR^} 

{C n Df ■= (C U Df ■= U {-^Cf := Ax \ 

{3R.Cf := {a G Z\x I I (a, b) G RF} 

{^R.Cf := {a G Z\x I I {a,b) G R^} C C^} 

:= {oG Z\x I (mi (a), . . . , w^(a)) G P'^} 

(st)^ := {a G I 9^{(T) undefined} 

An interpretation X is called a model for a concept C iff ^ 0 and a model 
for a TBox T iff A^ = for allA = CeT. 

We call elements from Ax abstract objects and elements from Ajj concrete ob- 
jects. Our definition of ACC(fD) differs slightly from the original version in [1]: 
Instead of separating concrete and abstract features, Baader and Hanschke de- 
fine only one type of feature which is interpreted as a partial function from Ax 
to Ax U Ax- We choose the separated approach since it allows clearer proofs. 
Moreover, it is not hard to see that the combined features can be “simulated” 
using pairs of concrete and abstract features. 

Definition 5 (Inference Problems). Let C and D be concepts. C sub- 
sumes D w.r.t. a TBox T (written D C7- C) iff C for all models X 
ofT. C is satisfiable w.r.t. a TBoxT iff there exists a model of both T and C. 

Both inferences are also considered without reference to TBoxes, i.e., with refer- 
ence to the empty TBox. It is well-known that (un)satisfiability and subsumption 
can be mutually reduced to each other: C C7- D iff Cn-iD is unsatisfiable w.r.t. 
T, and C is satisfiable w.r.t. T iff we do not have C C7- _L. We call two concepts 
C and D equivalent iff C subsumes D and D subsumes C. 

Let us now further extend ALCXifD) with a role-forming concrete domain 
constructor, i.e., with a constructor that allows the definition of complex roles 
with reference to the concrete domain. Such a constructor was first defined in [8], 
where it is motivated as an appropriate tool for spatial reasoning. 

Definition 6 {ACCTZVX{T>) Syntax and Semantics). A predicate role is an 
expression of the form 3(ui, . . . , u„), (ui, . . . , Vm).P where P is an n m-ary 
predicate. The semantics of predicate roles is 

{3{ui,...,Un),{vi,...,v^).Pf : = {{a,b) G Ax x Ax \ 

{u^{a),...,ul{a),vf{b),...,v(j{b)) G P^}. 
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With TZ, we denote the set of predicate roles. The set of ACCTZVKfD) roles TZ 
is defined as Nr \JTZ\J {i?“ | R € TZ}. A role which is either a predicate role or 
the inverse of a predicate role is called complex role. An ALCI{'D)- concept with 
roles from \ NaP replaced with roles from TZ is called ACCTZ'PI{'D)-co\icep>t . 

An ACCTZ'PX{T>)-CGncepit not using the inverse role constructor is called an 
ACCTZ'P{V)- concept. For example, the following is an A£C7^PI(P)-concept 

ErrorFl 3time, next time.< □ 'inextA{3{time), {time). <)~ .-'Error 

where Error is a concept, time is a concrete feature and next is an abstract 
feature. This concept is unsatisfiable since every domain object satisfying it 
would have to be both in Error and -> Error which is impossible. In [7], it is 
proved that satisfiability of ALCTZT’ {T>)-concepts is undecidable. However, as 
shown in [8], there exists a decidable fragment of ACCTZ'P{'D) that is still a useful 
extension of ACC{T>). In the following, we introduce an analogous fragment of 
the logic ACCTZT’I{T>) . 

Definition 7 (Restricted ACCTZT’I{T>)-concept). Let C he an 

ACCTZ'PI{'D)-concept, and suh{C) the set of subconcepts of C. Then C is 
called restricted iff it fulfills the following conditions: 

1. For any 'iR.D G suh{C), where R is a complex role, suh{D) does not contain 
any concepts of the form 3u\, . . . ,Un-P or 3S.E, where S is a complex role. 

2. For any 3R.D G sub{C), where R is a complex role, suh{D) does not contain 
any concepts of the form 3ui, . . . ,Un.P or \/S.E, where S is a complex role. 

Intuitively, these restrictions enforce the finite model property which leads to 
decidability, see [8,16] for details. In the remainder of this paper, we assume all 
A£CTZVT{T>) concepts to be restricted without further notice. Note that the 
set of restricted A£CTZVT{'D)-concepts is closed under negation, and, hence, 
subsumption can be reduced to satisfiability. 

3 A NExpTiME-Complete Variant of the PCP 

The Post Correspondence Problem (PCP), as introduced 1946 by Emil Post [17], 
is an undecidable problem frequently employed in undecidability proofs. In this 
section, we define a NExpTiME-complete variant of the PCP together with a 
concrete domain V that is suitable for reducing PCPs to the satisfiability problem 
of Description Logics with concrete domains. 

Definition 8 (PCP). A Post Correspondence Problem (PCP) P is given by a 
finite, non-empty list (^i, ri), . . . , {£k, Tk) of pairs of non-empty words over some 
alphabet E. A sequence of integers i\,. . . ,im, with m > 1, is called a solution 
for P iff = Di • • • Dm- Ijct f{n) he a mapping from N to N and let |P| 

denote the sum of the lengths of all words in the PCP P. A solution i\, . . . ,im 
for P is called an /(n)-solution iff m < /(|P|). With /(n)-PCP, we denote the 
version of the PCP that admits only f{n)- solutions. 
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Analogous to the undecidability result for the general PCP given by Hopcroft 
and Ullman in [9], we may prove the following result. 

Theorem 1. It is NExpTiME-compfete to decide whether o 2” + \-PCP has a 
solution. 

Hence, a reduction of the 2" + 1-PCP is a candidate for proving NExpTime 
lower bounds for Description Logics with concrete domains. As we will see now, 
the problem is in fact well-suited for this task since it is possible to define an 
appropriate concrete domain. It follows from the proof of the above theorem 
that it is sufficient to consider some fixed, finite alphabet Su whose cardinality 
is the number of symbols needed to define a universal Turing machine. 

Definition 9 (Concrete Domain V). The concrete domain V is defined by 
setting A-p := Efi and defining as the smallest set containing the following 
predicates: 

— unary predicates word and nword with wordfi = Ap and nwordfi = 0, 

— unary predicates =e and with =^= {e} and E^, 

— a binary equality predicate = and a binary inequality predicate yf, and 

— for each w £ E^, two binary predicates concw and nconcw with 

conc^ = {(m, w) I V = uw} and nconc^ = {{u,v) \ v yf uw}. 

The complexity of reasoning with a Description Logic providing a concrete do- 
main T> does obviously depend on the complexity of reasoning with T>. More 
precisely, most satisfiability algorithms involve checking the satisfiability of fi- 
nite conjunctions of concrete domain predicates 

l<i<k 

(i) 

where each Pi is an rij-ary predicate and the ' are variables from some fixed 
set [1]. This is also the case for the tableau algorithm that used to prove the 
upper bound in Section 7. Hence, we are interested in the complexity of this task 
which is called 'D-satisfiability in what follows. By devising an algorithm that is 
based on repeated normalization combined with tests for obvious inconsistencies, 
the following result can be obtained. 

Proposition 1. V -satisfiability is decidable in deterministic polynomial time. 

On first sight, the concrete domain V may look somewhat unnatural in the 
context of knowledge representation. However, it is straightforward to encode 
words as natural numbers and to define the operations on words as rather simple 
operations on the naturals [2]: Words over the alphabet Eu can be interpreted 
as numbers written at base \Eu \ -\- 1 (assuming that the empty word represents 
0); the concatenation of two words v and w can then be expressed as vw = 
u*(|A[/| + 1)H -|- w, where |w| denotes the length of the word w. Hence, each 
concrete domain (A,d>), where A contains the natural numbers and contains 
predicates for (in)equality, (in)equality to zero, addition, and multiplication may 
also be used for the reductions. A concrete domain with these properties is called 
arithmetic. 
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Ch[ui,U 2 ,U 3 ,Ui] = (3(mi,M 2). = n 3(«3,M4). =) 

U U (3(«l, M 2 )-C 0 ncr^ n 3 (m 3, M4).C0nCrj) 

in P 

Co = 3£.c'i n 3r.Ci 

n Ch[ir"~^ gi, r£"~^ gi, ir"~^ gr, ri"~’^gr] 

Cn -2 = 3£.C'„_i n 3r.C'„_i 

n Ch[lrgi,rlgi,irgr,r£gr] 

C„-i = Ch[£ge,rgi,£gr,rgr] 

Cp = Co 

n3r<7^.=. n3r3„ =, 
n 3r"y3gi,gr. = n Br^ygi. 
n Ch[r'^ gi,r'^ xgi,r" gr,r'^ xgr] 
n Ch[r"xgi,r"ygi,r"xgr,r"ygr] 



Fig. 1. The ACC{V) reduction TBox Tp {n = |P1). 

4 Satisfiability of A/^C(P)-Concepts w.r.t. TBoxes 

In this section, we show that the satisfiability of ^£C(P)-concepts w.r.t. TBoxes 
is NExpTiME-hard. As already mentioned, this result is rather surprising since 
(1) satisfiability of A£C (P)-concepts without reference to TBoxes is known to 
be PSPACE-complete if reasoning with the concrete domain T> is in PSpace [14], 
and (2) admitting acyclic TBoxes does “usually” not increase the complexity of 
reasoning [13]. 

The proof is by a reduction of the 2" + 1-PCP using the concrete domain V 
introduced in the previous section. Given a 2” + 1-PCP P = (£i, ri), . . . , {£k, rk), 
we define a TBox 7p of size polynomial in |P| and a concept (name) Cp such that 
Cp is satisfiable w.r.t. 7p iff P has a solution. Figure 1 contains the reduction 
TBox and Figure 2 an example model for |P| = 2. In the figures, £, r, x, and 
y denote abstract features and gi and gr denote concrete features. The first 
equality in Figure 1 is not a concept definition but an abbreviation: Replace 
every occurrence of G/i[ui, M 2 , u-s, M 4 ] in the lower three concept definitions by 
the right-hand side of the first identity substituting ui, . . . ,U 4 appropriately. 

The idea behind the reduction is to define 7p such that models of Cp and 
7p have the form of a binary tree of depth |P| whose leaves are connected by 
two “chains” of concw predicates. Pairs of corresponding objects (xi,yi) on the 
chains represent partial solutions of the PCP P. More precisely, the first line 
of the definitions of the Co,...,C„_i concepts ensures that models have the 
form of a binary tree of depth n (with n = |P|) whose left edges are labeled 
with the abstract feature £ and whose right edges are labeled with the abstract 
feature r. Let the abstract objects an,o, ■ ■ ■ On. 2 "-! be the leaves of this tree. By 
the second line of the definitions of the Co, . . . ,C„_i concepts, every any has 
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► = equality or conc^ for some w 



Fig. 2. An example model of Cp and 7p for n = 2. 



a (/^-successor Xi and a (?r-successor yi. These second lines also ensure that the 
Xi and yi objects are connected via two predicate chains, where the predicates 
on the chains are either equality or concw More precisely, for 0 < i < 2" — 1, 
either Xi = Xi+i and yi = j/i+i, or there exists a j G fc} such that 

(xi,Xi+i) € concf. and (yi,yi+i) G conc^.. Furthermore, by the second line of 
the definition of Cp, we have Xi = yi = e. Hence, pairs {xi,yi) are partial 
solutions for P. Since we must consider solutions of a length up to 2" + 1, the 2” 
objects on the fringe of the tree with their 2" — 1 connecting predicate edges are 
not sufficient, and we need to “add” two more objects and a„_ 2 ’»+i which 
behave analogously to the objects an,o, ■ ■ • a„^ 2 "-i- This is done by the last two 
lines of the definition of Cp. Finally, the third line of the definition of Cp ensures 
that X 2 n+i = t/ 2 "+i =/= e and hence that (x 2 "+i, y 2 '*+i) is in fact a full solution. 

Obviously, the size of 7p is polynomial in |P| and 7p can be constructed in 
time polynomial in |P| which, together with the fact that P may be replaced by 
any arithmetic concrete domain, yields the following theorem. 

Theorem 2. For every arithmetic concrete domain T>, satisfiability of ACC{T>)- 
concepts w.r.t TBoxes is NExpTiME-Ziord. 

We also obtain a lower bound for subsumption since satisfiability can be reduced 
to subsumption. With some slight modifications, the reduction just presented 
can also be applied to the Description Logic ACCTZ{V), i.e., ACC{V) enriched 
with a role conjunction constructor [6]. Hence, reasoning with this logic is also 
NExpTiME-hard. The corresponding reduction concept can be found in [16]. 

One may ask why we are interested in the relatively weak acyclic TBoxes 
instead of using a more general TBox formalism. The answer is that using general 
TBoxes leads to undecidability. 

Definition 10 (General TBox). A general concept inclusion (GCI) has the 
form C Q D, where both C and D are concepts. An interpretation I is a model 
for a GCI CCD iff C^ C . Finite sets of GCIs are called general TBoxes. 
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Fig. 3. An example model of Cp w.r.t. 7p 



An interpretation I is a model for a general TBox T iff I is a model for all 
GCIs in T. 

Using the concrete domain V and a reduction of the general PCP, the following 
theorem can be obtained. 

Theorem 3. For every arithmetic concrete domain T>, satisfiability of ACC {T>)- 
concepts w.r.t. general TBoxes is undecidahle. 

Proof Let P be an instance of the PCP. Define a concept Cp and a general 
TBox 7p as follows: 

Cp := 3g. n 3fg. 

Tp := {3/.T C ^ n 3g, f^g.conct^ n 3fg,fifg.conCr^ 

T E 3g. =e U -i3g,fg.=] 

An example model of Cp w.r.t. 7p can be found in Figure 3. The first GCI 
ensures that models of Cp and 7p represent all possible solutions of the PCP P. 
Additionally, the last GCI ensures that no potential solution is a solution. It 
is hence straightforward to prove that Cp is satisfiable w.r.t. 7p iff P has no 
solution, i.e., we have reduced the general, undecidable PCP [17,9] to the satis- 
fiability of ^£C(27)-concepts w.r.t. general TBoxes. 



5 Satisfiability of A/^CX(P)-Concepts 

We now show that satisfiability of ACCI{P)-concepts — without reference to 
TBoxes — is NExpTiME-hard. As in the previous section, it is surprising that 
a rather small change in the logic, i.e., adding inverse roles, causes a dramatic 
increase in complexity. 
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Fig. 4. Predicate chains in models of Cp. 



The reduction is similar to the one used in the previous section: it is a re- 
duction of the 2” -I- 1-PCP and uses the concrete domain V. However, we need a 
slightly different strategy since, in the case of inverse roles, it is not possible to 
enforce chains of predicates connecting the leaves of the tree. Instead, the pred- 
icate chains emulate the structure of the tree following the scheme indicated in 
Figure 4. Given a PCP P = (£i, ri), . . . , (t'^, r^), we define a concept Cp of size 
polynomial in |P| which has a model iff P has a solution. The concept Cp can be 
found in Figure 5. In the figure, hi,hr,Xi,Xr,ye,yr, Z£, and Zr are concrete fea- 
tures. Note that the equalities are not concept definitions but abbreviations. As 
in the previous section, replace every occurrence of Ch[ui,U 2 , Ms, M4] in the lower 
three concept definitions by the right-hand side of the first identity substituting 
Ml, . . . , M4 appropriately and similarly for every occurrence of X. 

Let us discuss the structure of models of Cp. Due to the first line in the 
definition of Cp and the 3f~ quantifiers in the definition of X, models of Cp 
have the form of a tree of depth | P | — 1 in which all edges are labeled with f~ . This 
edge labelling scheme is possible since the inverse of an abstract feature is not a 
feature. Additionally, we establish two chains of concrete domain predicates as 
indicated in Figure 4. Again, corresponding objects on the two chains represent 
partial solutions of the PCP P. A more detailed clipping from a model of Cp 
can be found in Figure 6. The existence of the chains is ensured by the definition 
of X and the second line in the definition of Cp: The concept X establishes the 
edges of the predicate chains as depicted in Figure 6 (in fact. Figure 6 is a model 
of the concept A) while the second line of Cp establishes the edges “leading 
around” the leaves. Edges of the latter type and the dotted edges in Figure 6 are 
labeled with the equality predicate. To see why this is the case, let us investigate 
the length of the chains. 

The length of the two predicate chains is twice the number of edges in the 
tree plus the number of leaves, i.e., 2 * (2l^l — 2) -|- 21-^1“^. To eliminate the factor 
2 and the summand Cp is defined such that every edge in the predicate 

chains leading “up” in the tree and every edge “leading around” a leaf is labeled 
with the equality predicate. To extend the chains to length 2l^l -|- 1, we need to 
add three additional edges (definition of Cp, lines three, four, and five). Finally, 
the last two lines in the definition of Cp ensure that the first concrete object on 
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Ch[u-i_,U2,Uz,U4\ = (3(ui,M 2). = n 3 (u3,M4). =) 

U U 3(mi, M 2 ). concr,- n 3 (m 3, U4).conCrj 

X = 3f~ .{Ch[fgi,gijgr,gv\ n 3(/i^,/p^). = n 3{hr,fpr). =) 
n 3f~.{Ch[fpe,ge,fpr,gr] n 3{he,fhe). = n 3{hr,fhr). =) 

Cp = xn Vf~.x n • • • n v(f~)"~\x 

nv(rr-(3(ge,he).= n3(p.,M-=) 

n Ch[hi, Xl,hr, Xr] 
n Ch[xe,ye,Xr,yr] 
n Ch[ye,ze,yr,Zr] 

n 3gi, =e n 3gr, =e 
n 3zi, Zr- = n 3zi. 



Fig. 5. The ACCT{P) reduction concept Cp (n = |P| — 1). 




Fig. 6. A clipping from a model of Cp. 



both chains represents the empty word and that the last objects on the chains 
represent a (non-empty) solution for P. 

Theorem 4. For every arithmetic concrete domain T>, satisfiability of 
ACCI{T>)- concepts is NExpTiME-/iord. 



6 Satisfiability of «4./^C7?.7^(7^)-Concepts 

In this section, we prove that satisfiability of ACCTZV {V)-concepis without ref- 
erence to TBoxes is NExpTiME-hard. Hence, adding the role-forming concrete 
domain constructor yields another extension of ACC{T>) in which reasoning is 
much harder than in ACC{T>) itself. 

Given a PCP P = (£ 1 , ri), . . . , r^,), we define a concept Cp of size polyno- 

mial in |P| which has a model iff P has a solution. The concept Cp can be found 
in Figure 7, where x and y denote abstract features and p denotes a predicate 
(written in lowercase to avoid confusion with the PCP P). Again, the equalities 
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k 

DistB[k] = n ((Si ^ yR.B,) n -nBi yR.^Bi) 

i=0 

Tree = 3R.Bq n 3R,.^Bq 

n VR.{DistB[0] n 3R.Bi n 3R.^Bi) 

n yR"~^.{DistB[n - 1] n 3R.Bn-i n 3R.^Bn-i) 

S[g,p] = 3 ( 5 ), {g)-p 

/n — 1 /k — 1 \ 

Edge[g,p] = U U -.5^ n (Bfc ^ yS[g,p].~^Bk) n {-.Bu ^ yS[g,p].Bk) 

\k=0 \j=0 ) 

n—1 /k — 1 \ \ 

u U n B, n (Bfc ^ ys[g,p].Bk) n {^Bu ^ VS[s,p].-iBfc) 

fc=o \j=o j j 

DEdge = {Edge[gi, =] n Edge[gr, =]) LI 

LJ {Edge[ge, conci^] n Edge[gr, concrj) 

(£i ,ri) in P 

Ch\ui,U2,U3,U4\ = (3(ui,1t2). = n 3 (u3,W4). =) 

U U (3(ui, M 2 ) -COnQ^ n 3 (u3, U4)-C0nCri) 

in P 

Cp = Treer\yR!^.3gi.wordr\'iRJ^.3gr.word 

n VB". [(^Bo n • • • n ^B„_i) (3gi. =, n 3gr. =.) 

n ^(Bo n • • • n Bn_i) DEdge 
n(Bon---nB„_i) 

{Ch{ge, xge, gr, xgr) n Ch{xge,yge, xgr, ygr))] 
Fig. 7. The A£CTZV(V) reduction concept Cp (n = |P|). 



in the figure serve as abbreviations. Moreover, we use C ^ D as an abbrevia- 
tion for -iC U D. Note that S[g,p] denotes a predicate role and not a concept, 
i.e., S[g,p] is an abbreviation for the role-forming concrete domain constructor 

3(5), (5) -p. 

Figure 8 contains an example model of Cp with \P\ = n = 2. Obviously, 
the models of Cp are rather similar to the ones from the ACC(T>) reduction in 
Section 4: models have the form of a binary tree of depth n whose edges are 
labelled with the role R and whose leaves (together with two “extra” nodes) are 
connected by two predicate chains of length 2” 3- 1. The Tree concept enforces 
the existence of the binary tree. The concept names Bq, , B„_i are used for 
a binary numbering (from 0 to 2” — 1) of the leaves of the tree. More precisely, 
for a domain object a G A^, set 



pos{a) = * 2* where 



r 1 if a G Bf 
( 0 otherwise. 
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Fig. 8. An example model of Cp with |P1 = 2. 



The Tree and DistB concepts ensure that, if two leaves a and a' are reachable 
via different paths from the root node, then we have pos{a) ^ pos(a'). Due to 
the first line of the Cp concept, every leaf has (concrete) gi~ and (/r -successors. 
The last two lines of Cp guarantee the existence of the two extra nodes which 
are connected by predicate edges due to the use of the Ch concepts. Hence, it 
remains to describe how the edges between the leaf nodes are established. 

There are two main ideas underlying the establishment of these edges: (i) 
use the role-forming predicate constructor to establish single edges and (ii) use 
the position pos{) of leaf nodes together with the fact that counting modulo 2” 
can be expressed by ^£C-concepts to do this with a concept of size polynomial 
in |P|. We first illustrate Point (i) in an abstract way. Assume that we have 
two abstract objects a and 6, a has g^-successor x and b has g^-successor y. 
Moreover, let h G for some concept X. We may then establish a p-edge (for 
some binary predicate p G T>p) between x and y as follows: we enforce that 
a G (yS[gi,p].^X)^-, since b G X^, it follows that (a, 6) ^ S[gi,p]^, i.e., (a, 6) ^ 
(3(p^), {ge).p)^ and thus {x,y) ^ p^ , which obviously implies that {x,y) G p^ . 

In the third line of the Cp-concept, the DEdge concept is used to establish 
edges between the leaf nodes. The DEdge concept itself is just a disjunction over 
the various edge types while the Edge concept actually establishes the edges. In 
principle, the Edge concept establishes the edges as described above. However, 
it does this not only for two fixed nodes as in the description above but for all 
neighboring leaf nodes. To see how this is achieved, note that Edge is essentially 
the negation of the well-known propositional formula 

n— 1 k — 1 n—1 k—1 

/\i /\ ^ i^k = I ^ x'k = 0) A Xj =0) ^ {Xk = x'k) 

k—0 j—0 k—0 j=0 

which encodes incrementation modulo 2", i.e., if t is the number (binarly) en- 
coded by the propositional variables xq, , Xn-i and t' is the number encoded 
by the propositional variables Xq, . . . , then we have t' = t + l modulo 2”, 
c.f. [4]. Assume a G {Edge[gi,p\f- (where p is either “=” or conci^) and let b 
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be the leaf with pos{b) = pos{a) + 1, a; be the g^-successor of a, and y be the 
( 7 £-successor of b. The Edge concept ensures that, for each S' -successor c 
of a, we have pos{c) ^ pos{a) + 1, i.e., there exists an i with 0 < i < n such 
that c differs from b in the interpretation of Bi. It follows that (a, b) ^ S[ge,p]^. 
As described above, we can conclude {x, y) G p^. All remaining issues such as, 
e.g., ensuring that one of the partial solutions is in fact a solution, are as in the 
reduction given in Section 4. Note that the reduction concept is restricted in the 
sense of Section 2. 

Theorem 5. For every arithmetic concrete domain T>, satisfiability of 
ACCTZVfiD)- concepts is NExpTiME-Ziord. 

7 Upper Bounds 

Due to space limitations, we can only give a short sketch of the proof of the upper 
bound and refer to [16] for details. First, a tableau algorithm for deciding the 
satisfiability of ACC'R,VX{V)-concepis without reference to TBoxes is devised. 
This algorithm combines techniques from [8] for reasoning with ACCTZVfiD) 
with techniques from [10] for reasoning with inverse role. Second, the tableau 
algorithm is modified to take into account TBoxes by performing “on the fly 
unfolding” of the TBox as described in [13]. A complexity analysis yields the 
following theorem. 

Theorem 6. If T> -satisfiability is in NP, satisfiability of ACCTZVIfiD)- concepts 
w.r.t. TBoxes can be decided in nondeterministic exponential time. 

This also gives an upper bound for subsumption since, as mentioned in Section 2, 
subsumption can be reduced to satisfiability. It should be noted that the above 
theorem only applies to so-called admissible concrete domains, where a concrete 
domain T> is admissible if the set <P'p if closed under negation and contains a 
predicate name T for Aj) [16]. Nevertheless, the given theorem captures a large 
class of interesting concrete domains such as V itself and concrete domains for 
temporal and spatial reasoning [8,15] . In contrast to the upper bound for ACCfiD) 
established in [14], the above theorem if concerned with concrete domains for 
which P-satisfiability is in NP instead of in PSpace. For concrete domains of 
this latter type, the tableau algorithm in [16] yields an ExpSpace upper bound. 
A matching lower bound, however, is yet to be proved. 

8 Related and Future Work 

We demonstrated that the PSpace upper bound for (22)-concept satis- 
fiability is not robust: complexity shifts to NExpTime if seemingly harmless 
constructors are added and is even undecidable if we admit general TBoxes. 
However, the situation is not hopeless in all cases. Although the class of arith- 
metic concrete domains is quite large and captures many interesting concrete 
domains, there still exist non-trivial concrete domains for which reasoning with 
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general TBoxes is decidable and the NExpTime lower bound obtained in this 
paper do presumably not hold. An example is presented in [15], where a temporal 
Description Logic based on concrete domains is defined. 

As future work, it would be interesting to extend the obtained logics by 
additional means of expressivity such as transitive roles and qualifying number 
restrictions [11]. There are at least two ways to go: In [14] it is proved that reason- 
ing with ACCT{T>), i.e., the extension of ACC{T>) with feature agreements and 
disagreements, is PSPACE-complete (if reasoning with V is in PSpace). Hence, 
one could define extensions of ACCT{T>) trying to obtain an expressive logic for 
which reasoning is still in PSpace. The second approach is to define extensions of 
ACCI{T>) which means that the obtained logics are at least NExpTiME-hard. 
Moreover, feature (dis) agreements — which are very closely related to concrete 
domains — cannot be considered since, in [16], we prove that the combination of 
inverse roles and feature (dis) agreements leads to undecidability. 

Acknowledgements. My thanks go to Franz Baader, Ulrike Sattler, and 
Stephan Tobies for inspiring discussions. The work in this paper was supported 
by the DFG Project BA1122/3-1 “Combinations of Modal and Description Log- 
ics” . 
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Abstract. This paper investigates optimization techniques and data 
structures exploiting the use of so-called pseudo models. These techniques 
are applied to speed up TBox and ABox reasoning for the description 
logics ACCN'Hn+ and ACC{T>). The advances are demonstrated by an 
empirical analysis using the description logic system RACE that imple- 
ments TBox and ABox reasoning for ACCN'Hn+ ■ 



1 Introduction 

We introduce and analyze optimization techniques for reasoning in expressive 
description logics exploiting so-called pseudo models. The new techniques being 
investigated are called deep model merging and individual model merging. The 
presented algorithms are empirically evaluated using TBoxes and ABoxes derived 
from actual applications. The model merging technique is also developed for the 
logic ACC{T>) [1] which supports so-called concrete domains. This is motivated 
by a proposal which extends ACCN'Hr+ with a restricted form of concrete 
domains [4]. 



1.1 The Language ACCAf'Hn+ 

We briefly introduce the description logic (DL) ACCN'Hfi+ [3] (see the tables 
in Figure 1) using a standard Tarski-style semantics based on an interpretation 
X = (Z\^, -^) ACCN'Hji+ extends the basic description logic ACC by role hier- 
archies, transitively closed roles, and number restrictions. Note that the com- 
bination of transitive roles and role hierarchies implies the expressiveness of 
so-called general inclusion axioms (GCIs). The language definition is slightly 
extended compared to the one given in [3] since we additionally support the 
declaration of “native” features. This allows additional optimizations, e.g. an 
efficient treatment of features by the model merging technique (see below) . The 
concept name T (T) is used as an abbreviation for C U ->C (C □ -iC). We assume 
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[Syntax 


Semantics | 


1 Concepts | 


A 


C 






Cn D 


C^nD^ 


Cud 


C^UD^ 


3R.C 


{a G 1 3 b G : (a, b) G R^, b G C^} 


VR.C 


{a G A^ V b : (a, b) G ^ b G C^} 


3>n S 
3<m S 


{aG A^l ||{bGA^|(a,b)GS^}|| > n} 
{aG A^l ||{bGA^|(a,b)GS^}|| < m} 


1 Roles 1 


R 


R^ C A^ X A^ 



1 Terminol. Axioms I 


Syntax 


Satisfied if 


R G r 
F G F 

R C S 
CCD 


R^ = (R^)"^ 

A^ C (3<i F)^ 
R^ C 
C^CD^ 



1 Assertions | 


Syntax 


Satisfied if 


a:C 

(a,b):R 


a^ G C^ 

(a^, b^) G R^ 



Fig. 1. Syntax and Semantics of ACCN'Hji+ {n,m G N, n > 0, || • || denotes set 
cardinality, and S £ S’). 



a set of concept names C, a set of role names R, and a set of individual names 
O. The mutually disjoint subsets F, P, T oi R denote features, non-transitive, 
and transitive roles, respectively {R = F U P U T). 

If R, S G R are role names, then the terminological axiom R C S is called a role 
inclusion axiom. A role hierarchy 7^ is a finite set of role inclusion axioms. Then, 
we define C* as the reflexive transitive closure of Cl over such a role hierarchy 
TZ. Given C*, the set of roles R'^ = {S G 7? | S C* R} deflnes the descendants of a 
role R. R^ = {S G 7? I R E* 5} is the set of ancestors of a role R. We also deflne 
the set 5' = {RGP|R'^nT = 0}of simple roles that are neither transitive nor 
have a transitive role as descendant. Every descendant G of a feature F must be 
a feature as well (G G F). 

A syntactic restriction holds for the combinability of number restrictions and 
transitive roles in ACCM'Hfi + . Number restrictions are only allowed for simple 
roles. This restriction is motivated by an undecidability result in case of an 
unrestricted combinability [8]. 

If G and D are concept terms, then G C D {generalized concept inclusion or 
GCI) is a terminological axiom. A finite set of terminological axioms 7n is called 
a terminology or TBox w.r.t. to a given role hierarchy TZ.^ 

An ABox .4 is a finite set of assertional axioms as defined in Figure Ic. The set 
O of object names is divided into two disjoint subsets, Oq and On-^ An initial 
ABox A may contain only assertions mentioning old individuals (from Oq). 
Every individual name from O is mapped to a single element of in a way such 
that for a, b G Oq, if a yf b (unique name assumption or UNA). This 

ensures that different individuals in Oq are interpreted as different elements. The 
UNA does not hold for elements of On, i.e. for a, b G On, sA = \A may hold even 
if a yf b, or if we assume without loss of generality that a G On, b G Oq- 

^ The reference to TZ is omitted in the following. 

^ The set of “old” individuals names characterizes all individuals for which the unique 
name assumption holds while the set of “new” names denotes individuals which are 
constructed during a proof. 
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The ABox consistency problem is to decide whether a given ABox A is con- 
sistent w.r.t. a TBox T. Satisfiability of concept terms can be reduced to ABox 
consistency as follows: A concept term C is satisfiable iff the ABox {a:C} is 
consistent. The instance problem is to determine whether an individual a is an 
instance of a concept term C w.r.t. an ABox A and a TBox T, i.e. whether A 
entails a : C w.r.t. T. This problem can be reduced to the problem of deciding if 
the ABox AU{a:-iC} is inconsistent w.r.t. T. 



1.2 A Tableaux Calculus for ACCAf'Hji+ 

In the following we present a tableaux algorithm to decide the consistency of 
ACCN'Hn+ ABoxes. The algorithm is characterized by a set of tableaux or com- 
pletion rules and by a particular completion strategy ensuring a specific order for 
applying the completion rules to assertional axioms of an ABox. The strategy is 
essential to guarantee the completeness of the ABox consistency algorithm. The 
purpose of the calculus is to generate a so-called completion for an initial ABox 
A in order to prove the consistency of A or its inconsistency if no completion 
can be found. 

First, we introduce new assertional axioms needed to define the augmentation 
of an initial ABox. Let C be a concept term, a, b G O be individual names, and 
X ^ O, then the following expressions are also assertional axioms: (1) \/ x .{x:C) 
(universal concept assertion), (2) a ^ b (inequality assertion). An interpretation 
X satisfies an assertional axiom M x .{x:Q) iff and a ^ b iff a^ yf b^. 

We are now ready to define an augmented ABox as input to the tableaux 
rules. For an initial ABox A w.r.t a TBox T and a role hierarchy TZ we define 
its augmented ABox or its augmentation A' by applying the following rules 
to A. For every feature name F mentioned in A the assertion Va;.(a;:(3<j F)) 
is added to A' . For every GCI C C D in T the assertion Vz. (a;:(-'C U D)) is 
added to A! . Every concept term occurring in A is transformed into its usual 
negation normal form. Let O'q = {ai, . . . ,an} C Oq be the set of individuals 
mentioned in A, then the following set of inequality assertions is added to A''. 
{ai ^ aj I a;, aj G O'q, i,j G l..n, i ^ j}. Obviously, if A' is an augmentation of A 
then A' is consistent iff A is consistent. 

ACCN'Hr+ supports transitive roles and GCIs. Thus, in order to guaran- 
tee the termination of the tableaux calculus, the notion of blocking an individ- 
ual for the applicability of tableaux rules is introduced as follows. Given an 
ABox A and an individual a occurring in A, we define the concept set of a as 
a(A,a) := |T} U (C I a : C G A}. We define an individual ordering for new 
individuals (elements of Ojv) occurring in an ABox A. If b G Ov is introduced 
into A, then a -< b for all new individuals a already present in A. Let A be an 
ABox and a, b G O be individuals in A. We call a the blocking individual of b 
if all of the following conditions hold: (1) a,b G On, (2) a{A,a) D a{A,h), (3) 
a ^ b. If there exists a blocking individual a for b, then b is said to be blocked 
(by a). 

We are now ready to define the completion rules that are intended to generate 
a so-called completion (see also below) of an initial ABox A w.r.t. a TBox T. 
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Rn The conjunction rule. 

if a : C n D G and {a : C, a:D} A 
then ^ U {a : C, a : D} 

RU The disjunction rule. 

if a : C U D G and {a : C, a : D} fl ^ = 0 
then ^ U {a : C} or ^ U {a : D} 

RVC The role value restriction rule. 

if a :V R . C G and 3 b G O, S G R'^ : (a, b) : S G and h:C ^ A 
then A' = AU {h:C} 

RV+C The transitive role value restriction rule. 

if 1. a :V R . C G .4, and 3 b G O, T G R'^, T G T, S G T'^ : (a, b) : S G .4, and 
2. b:VT.C 

then A' = AiJ {b:VT . C} 

RVa; The universal concept restriction rule. 

if V a; . (a; : C) G .4, and 3 a G O: a mentioned in A, and a-.Q ^ A 
then ^ U {a : C} 

R3C The role exists restriction rule. 

if 1. a : 3 R . C G and a is not blocked, and 
2. ^3b G O, S G R^ : {(a,b):S, b:C} C A 
then A' = Ayj {(a, b) : R, b: C} where b G On is not used in A 
R3>„ The number restriction exists rule, 
if 1. a :3>„ R G and a is not blocked, and 
2. ^3bi,...,bn G O, Si,...,Sn G R'^ : 

{(a,bk) :Sk I k G l..n} U {bi ^ bj | i,j G l..n,i ^ jj C A 
then A' = AO {(a, bk) : R | k G l..n} U {bi ^ bj | i,j G l..n, i ^ j} 
where bi, . . . , bn G On are not used in A 
R3<„ The number restriction merge rule, 
if 1. a :3<„ R G .4, and 

2. 3bi,...,bm G O, Si,...,Sm G R'^i {(a,bi):Si,...,(a,bm)^Sm} ^ A 
with m > n, and 

3. 3bi,bj G {bi,...,bm} : i j, b, ^ bj ^ ^ 

then A' = ^[bi/bj], i.e. replace every occurrence of b| in A by bj 

Given an ABox A, more than one rule might be applicable to A. The order is 
determined by the completion strategy which is defined as follows. 

A meta rule controls the priority between individuals: Apply a tableaux rule 
to an individual b G Ojv only if no rule is applicable to an individual a G Oq 
and if no rule is applicable to another individual c G On such that c A b. 

The completion rules are always applied in the following order. (1) Apply all 
non-generating rules (Rn, RU, RVC, RV+C, RV^,, R3<„) as long as possible. (2) 
Apply a generating rule (R3C, R3>„) once and continue with step 1. 

In the following we always assume that the completion strategy is observed. 
This ensures that rules are applied to new individuals w.r.t. the ordering 

We assume the same naming conventions as used above. An ABox A is called 
contradictory if one of the following clash triggers is applicable. If none of the 
clash triggers is applicable to A, then A is called clash- free. The clash triggers 
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have to deal with so-called primitive clashes and with clashes caused by number 
restrictions: 

— Primitive clash: a:_L G ^or {a: A, a: ^A} C A, where A is a concept name. 

~ Number restriction merging clash: 3 Si, . . . , G : {(a, bj) : S; | i G l..m} U 
{a :3<„ R} U {b; ^ bj | i,j G l..m, i ^ ^ A with m > n. 

Any ABox containing a clash is obviously unsatisfiable. A clash-free ABox A 
is called complete if no completion rule is applicable to A. A complete ABox A' 
derived from an initial ABox A is called a completion of A. The purpose of the 
calculus is to generate a completion for an initial ABox A to prove the consistency 
of A. An augmented ABox A is said to be inconsistent if no completion can be 
derived. For a given initial ABox A, the calculus applies the completion rules. 
It stops the application of rules, if a clash occurs. The calculus answers “?/es” 
if a completion can be derived, and “no” otherwise. Based on these notions we 
introduce and evaluate the new optimization techniques in the next sections. 



2 Deep Models for TBox Reasoning in A.CCJ\f'H,R+ 



Given a set of concepts representing a conjunction whose satisfiability is to be 
checked, the model merging strategy tries to avoid a satisfiability test which 
relies on the “expensive” tableaux technique due to non-deterministic rules. ^ 
This idea was first introduced in [5] for the logic A£C'Hffi+ . A model merging 
test is designed to be a “cheap” test comparing cached “concept models.” It is a 
sound but incomplete satisfiability tester for a set of concepts. The achievement 
of minimal computational overhead and the avoidance of any indeterminism are 
important characteristics of such a test. If the test returns false, a tableaux 
calculus based on the rules as defined in Section 1.2 is applied. In order to be 
more precise, we use the term pseudo model instead of “concept model.” 

For testing whether the conjunction of a set of concepts {Ci, . . . , Cn} is sat- 
isfiable, we present and analyze a technique called deep model merging that gen- 
eralizes the original model merging approach [5] in two ways: (1) we extend the 
model merging technique to the logic ACCN'Hn+, i.e. this technique also deals 
with number restrictions; (2) we introduce deep pseudo models for concepts that 
are recursively traversed and checked for possible clashes. 

Let A be a concept name, R a role name, and C a concept. The consistency 
of the initial ABox A = {a:C} is tested. If A is inconsistent, the pseudo model‘s 
of C is defined as T. If A is consistent, then there exists a non-empty set of 
completions C. A completion A' G C is selected and a pmodel M for a concept C 



® In our case the rules Ru and R3<„. 

For brevity a pseudo model is also called a pmodel. 
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is defined as the tuple M'^) of concept sets using the following 

definitions. 

M'^ = {A|a:A G A G C}, = {A|a:^A G A G C} 

= {3R.C|a:3R.CG^'}U{3>„R|a:3>„RGyl'} 

= {VR.C|a:VR.CG^'}U{3<„R|a:3<„RG^'} U 
{3R.C|a:3R.C G R G F} 

Note that pmodels are based on complete ABoxes. In contrast to the theoret- 
ical calculus presented above, model merging deals directly with features instead 
of representing them with at most restrictions. Therefore concept exists restric- 
tions mentioning features are also included in the sets M'^ of pmodels. This 
guarantees that a possible “feature interaction” between pmodels is detected. 



Procedure 1 mergable(M5', VM,D1) 

1: if MS' = 0 V MS G VM then 
2: return true 

3: else if T G MS V -iatoms_mergable(M5') then 
4: return false 

5: else 

6: for all M G MS do 

7: for all C G M^ do 

8: if critical_at_most(C, M , MS) then 

9: return false 

10: else 

11: MS' collect_successor_pmodels(C, MS) 

12: if (^D? A MS' / 0) V ^mergable(M5'', VM U {MS}, D?) then 

13: return false 

14: return true 



The procedure mergable shown in Procedure 1 implements the flat and 
deep model merging test. The test has to discover potential clashes which might 
occur if all pmodels in MS are merged, i.e. their corresponding concepts are 
conjunctively combined. The test starts with a set of pmodels MS, an empty set 
of visited pmodel sets VM, and a parameter D1 controlling whether the deep or 
flat mode (see below) of mergable will be used. The test recursively traverses 
the pmodel structures. In case of a potential clash, mergable terminates and 
returns false, otherwise it continues its traversal and returns true if no potential 
clash can be discovered. Testing whether the actual pmodel set MS is already a 
member of the set VM (line 1) is necessary to ensure termination (in analogy to 
blocking an individual) for the deep mode. A potential primitive clash is checked 
in line 3. If no primitive clash is possible for the “root individual” of the pmodel, 
it is tested whether a clash might be caused by interacting concept exists, concept 
value, at least, and at most restrictions for the same successor individuals. Two 
nested loops (lines 6-13) check for every pmodel M G MS and every concept 
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in the set whether an at most restriction might be violated by the other 
pmodels (line 8). If this is not the case,® the set MS' of “R-successor pmodels” 
is computed (line 11). If the flat mode is enabled and MS' yf 0, this indicates 
a potential interaction via the role R and mergable returns false (lines 12-13). 
If the deep mode is enabled, mergable continues and traverses the pmodels 
in MS'. Observe that the procedure mergable is sound but not complete, i.e. 
even if mergable returns false for a pmodel set the corresponding concept 
conjunction can be satisflable. 

The procedure atoms_mergable tests for a possible primitive clash between 
pairs of pmodels. It is applied to a set of pmodels MS and returns false if 
there exists a pair {Mi, M 2 } Q MS with {Mf' 0 M^^) yf 0 or {Mf^ 0 M^) 0. 

Otherwise it returns true. 

The procedure critical_at_most checks for a potential number restriction 
clash in a set of pmodels and tries to avoid positive answers which are too conser- 
vative. It is applied to a concept C of the form 3 S . D or 3>„ S, the current pmodel 
M and a set of pmodels MS = (Mj , . . . , M^}. Loosely speaking, it computes the 
maximal number of potential S-successors and returns true if this number ex- 
ceeds the applicable at most bound m. More precisely, critical_at_most returns 
true if there exists a pmodel M' G (MS\M) and a role R G with 3<m R G M''^ 
such that X^EGJV num(E, RS) > m, N = , RS = S'^' fl R'^. In all other 

cases critical_at_most returns false. The function num(E,iZ5') returns 1 for 
concepts of the form E = 3 R' . D and n for E = 3>„ R', if R' G RS, and 0 other- 
wise. 

The procedure collect.successor .pmodels is applied to a concept C of the 
form 3S.Dor3>„S and a set of pmodels MS. It computes the set Q containing 
all S-successor pmodels (by considering (transitive) superroles of S). We deflne 
Qaux = {D} if C = 3S.D and Qaux = 0 otherwise. Observe that 3 R . E G M'^ 
implies that R is a feature. The procedure collect.successor .pmodels returns 
the pmodel set {Me \ C G Q}. 

Q = Qan:.U{E|3M G Mb', R G : (V R . E G V 3 R . E G M'^)}U 
{VT.E|3M G Mb,R G S^,T G TnS^n R^ : VR.E G M'^} 

Note that mergable depends on the clash triggers of the particular tableaux 
calculus chosen since it has to detect potential clashes in a set of pmodels. The 
structure and composition of the completion rules might vary as long as the 
clash triggers do not change and the calculus remains sound and complete. 

Proposition 1 (Soundness of mergable). Let D1 have either the value true 
or false, CS = |Ci, . . . , Cp}, Me, = get.pmodel(Ci), and PM = [Me, \ i G l..n}. 
If the procedure call raergable{PM Dl) returns true, the concept 
Cl n . . . n Cn is satisflable. 

Proof. This is proven by contradiction and induction. Let us assume that the call 
mergable(PM, 0, D?) returns true but the initial ABox A= {a: (Ci □ . . . □ Cn)} 
is inconsistent, i.e. there exists no completion of A. Every concept Ci must be 

In the following let us assume that the concept C mentions a role name R. 
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satisfiable, otherwise we would have _L G PM and mer gable would return false 
due to line 3 in Procedure 1. Let us assume a finite set C containing all con- 
tradictory ABoxes encountered during the consistency test of A. Without loss 
of generality we can select an arbitrary A' € C and make a case analysis of its 
possible clash culprits. 

1. We have a primitive clash for the “root” individual a, i.e. {a:D,a:^D} C A'. 

Thus, a : D and a : have not been propagated to a via role assertions and 

there have to exist Ci,Cj G CS, i j such that a:D (a:^D) is derived from 
a:Ci (a:Cj) due to the satisfiability of the concepts Ci, i G l..n. It holds for 
the associated pmodels Mq, G PM that D G M^. fl Mq^. However, due 
to our assumption the call of mergable(PM, 0, D?) returned true. This is a 
contradiction since mergable called atoms.mergable with PM (line 3 in 
Procedure 1) which returned false since D G M^, fl Mfj^. 

2. A number restriction clash in A' is detected for a, i.e. a:3<„R G A' and 

there exist I > m distinct R-successors of a.® These successors can only be de- 
rived from assertions of the form a : 3 Sj . Ej or a : Sj with Sj G R'^, j G l..p. 

The concepts Ci G CS , \ G l..n are satisfiable and there has to exist a sub- 
set CS' C CS such that R G Cq^cs'^'c 

N = Ucecs' Mp, RS = (Ujgi,,pSj^) n R'^. However, due to our assumption 
the call of mergable(PM, 0, Dl) returned true. This is a contradiction since 
there exists a pmodel M'q, C G CS' and a concept E' G M'^q such that mer- 
gable called critical_at_most(E', M^, PM) (lines 6-8 in Procedure 1) which 
returned true since num(E', > I > m. 

3. Let the individual a„ be a successor of ao via a chain of role assertions 
(ao, ai) : Ri, . . . , (an_i, an) : Rn, n >0 and we now assume that a clash for a,, 
is discovered. 

a) In case of a primitive clash we have {ap : D, ap :^D} C A'. Without loss 
of generality we may assume that the clash culprits can only be derived 
from assertions of the form ap„i : 3>„j Rp or ap_i : 3 Rp . Ei in combination 
with ap_i :3 S' . E2 (if Rp and S' G Rp^ are features), and/or ap^i :VS" . E3 
with S" G Rp^. Due to the clash there exists a pair E', E" C {Ei, E2, E3} 
with D G Mg, n Mg,'/. Each role assertion in the chain between ao and 
ap_i can only be derived from assertions of the form ak_i :3 Rk . Ek or 
ak^i :3>mj, Rk with k G l..n — 1. The call graph of mergable(PM, 0, D?) 
contains a chain of calls resembling the chain of role assertions. By in- 
duction on the call graph we know that the node resembling ap„i of 
this call graph chain contains the call mergable(PM', PM', true) such 
that {Me' , Me"} Q pm' and atoms_mergable has been called with 
a set MS' and {Me' , Me»} Q MS'. The call of atoms _mergable has 
returned false since D G Mg, fl Mffft. This contradicts our assumption 
that mergable(PM, 0, D?) returned true. 

b) In case of a number restriction clash we can argue in an analogous way 
to case 2 and 3a. Again, we have a chain of role assertions where a 
number restriction clash is detected for the last individual of the chain. 



Due to our syntax restriction, the elements of R^ are not transitive. 
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Bike1 Bike2 BikeS Bike4 BikeS Bike6 Bike7 BikeS Bike9 



(a) Galen TBoxes (b) Bike TBoxes 

Fig. 2. Evaluation of model merging techniques (runtime in seconds, 3 runs for each 
TBox, left-right order corresponds to top-bottom order in the legend). 



It exists a corresponding call graph chain where by induction the last call 
of mergable called critical_at_most with a set of pmodels for which 
critical_at_most returned true. This contradicts the assumption that 
mergable(PM, 0, £1?) returned trwe. □ 

It is easy to see that this proof also holds if the value of D1 is false since the 
“flat mode” is more conservative than the “deep” one, i.e. it will always return 
false instead of possibly true if the set of collected pmodels M' is not empty 
(line 12 in Procedure 1). 

The advantage of the deep vs. the flat mode of the model merging technique 
is demonstrated by empirical tests using a set of “quasi-standard” application 
TBoxes [7,6,2]. Figure 2 shows the runtimes for computing the subsumption lat- 
tice of these TBoxes. Each TBox is iteratively classified using three different 
parameter settings. The first setting has the deep mode of model merging en- 
abled, the second one has the deep mode of model merging disabled but the flat 
mode still enabled, and the third one has model merging completely disabled. 
The comparison between setting one and two indicates a speed up in runtimes of 
a factor 1.5 — 2 if the deep mode is enabled. The result for setting three clearly 
demonstrate the principal advantage of model merging. 

The principal advantage of the deep vs. the flat model merging mode is due 
to the following characteristics. If the flat model merging test is (recursively) 
applied during tableaux expansion and repeatedly returns false because of in- 
teracting value and exists restrictions, this test might be too conservative. This 
effect is illustrated by an example: The deep model merging test starts with 
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the pmodels (0, 0, {3 R . 3 S . C}, 0) and (0, 0, 0, {V R . VS . D}). Due to interaction 
on the role R, the test is recursively applied to the pmodels (0, 0, {3 S . C}, 0) 
and (0, 0, 0, {VS . D}). Eventually, the deep model merging test succeeds with 
the pmodels ({C}, 0,0,0) and ({D}, 0,0,0) and returns true. This is in contrast 
to the flat mode where in this example no tableaux tests are avoided and the 
runtime for the model merging tests is wasted. 

The next section describes how model merging can be utilized for obtaining 
a dramatic speed up of ABox reasoning. 



3 Flat Models for ABox Reasoning in A.CCJ\f'H,R+ 

Computing the direct types of an individual a (i.e. the set of the most specific 
concepts from C of which an individual a is an instance) is called realization 
of a. For instance, in order to compute the direct types of a for a given sub- 
sumption lattice of the concepts Di, . . . , Dp, a sequence of ABox consistency 
tests for Ad, = A U (a : ^D;} might be required. However, individuals are usually 
members of only a small number of concepts and the ABoxes Ad-, are proven as 
consistent in most cases. The basic idea is to design a cheap but sound model 
merging test for the focused individual a and the concept terms without 
explicitly considering role assertions and concept assertions for all the other in- 
dividuals mentioned in A. These “interactions” are reflected in the “individual 
pseudo model” of a. This is the motivation for devising the novel individual 
model merging technique. 

A pseudo model for an individual a mentioned in a consistent initial ABox A 
w.r.t. a TBox T is defined as follows. Since A is consistent, there exists a set of 
completions C of A. Let A' € C. An individual pseudo model M for an individual 
a in A is defined as the tuple {M'^, M'^) w.r.t. A' and A using the 

same definitions from the previous section for the components M^, M'^ 

and the following definition. 

M^ = (3R.C|a:3R.CG A'} U |3>„ R | a : 3>„ R G A'}U|3>j R| (a,b):R G A} 

Note the distinction between the initial ABox A and its completion A'. When- 
ever a role assertion exists, which specifies a role successor for the individual a 
in the initial ABox, a corresponding at least restriction is added to the set . 
This is based on the rationale that the cached pmodel of a cannot refer to indi- 
vidual names. However, it is sufficient to reflect a role assertion (a, b) : R G A by 
adding a corresponding at least restriction to . This guarantees that possible 
interactions via the role R are detected. Note that individual model merging is 
only defined for the flat mode of model merging. 

Proposition 2 (Soundness of individual model merging). Let Ma be the 

pmodel of an individual a mentioned in a consistent initial ABox A, M^c be the 
pmodel of a satisfiable concept ->C, and PM = (Mq, M-,c}. If the procedure call 
raergable{PM f^^se) returns true, the ABox AU {a:^C} is consistent, i.e. a 
is not an instance of C . 
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Proof. This is proven by contradiction. Let us assume that the procedure call 
mergable({Ma, M^c}, false) returns true but the ABox A' = AVJ {a :->C} is 
inconsistent, i.e. there exists no completion of A' . Let us assume a finite set C 
containing all contradictory ABoxes encountered during the consistency test of 
A' ■ Without loss of generality we can select an arbitrary A” G C and make a 
case analysis of its possible clash culprits. 

1. In case of a primitive clash for a we have {a:D,a:^D} C Al' . Since A is 

consistent and the concept ->C cannot indirectly refer to the old individual 
a via a role chain, we know that either a: D or a:^D must be derived from 
a :^C and we have D € (Ma Id U fl This contradicts the 

assumption that the call mergable({Mo, M-,c}, false) returned true since 
mergable called atoms_mergable({Mo, M-,c}) which returned false (line 
3 in Procedure 1) since D G (M^ n MI)^) U n Mf;(.). 

2. A number restriction clash in Al' is detected for a, i.e. a:3<m R G Al' and 

there exist I > m distinct R-successors of a in A" . This implies that 
the set N = U contains concepts of the form 3Sj . Ej or 3>„^ Sj,^ 
Sj G R'^, j G l..k, such that num(E', > I, RS = (Ujgi,,kSj^) n R'^. 

This contradicts the assumption that mergable({Ma, 0,/aZse) re- 

turned true since mergable called critical_at_most (lines 6-8 in Procedure 
1) which returned true since ^g,g^num(E', RS) > I > m. 

3. A clash is detected for an individual b in A" that is distinct to a. Since 

A is consistent the individual b must be a successor of a via a chain of 
role assertions (a, bi) : Ri, . . . , (b„, b) : Rn+i, n > 0, and one of the clash cul- 
prits must be derived from the newly added assertion a : -iC and propa- 
gated to b via the role assertion chain originating from a with (a,bi):Ri. 
Since -iC is satisfiable and A is consistent we have an “interaction” via 
the role or feature Ri. This implies for the associated pmodels Ma,M^c 
that (M^ n M^q) U {Mf n M^q) yf 0. This contradicts the assumption that 
mergable({MQ, M-,c}, 0,/afee) returned true since mergable eventually 
called collect_successor_pmodels for Ma,M^c which returned a non- 
empty set (line 11 in Procedure 1). □ 



The performance gain by the individual model merging technique is empir- 
ically evaluated using a set of five ABoxes containing between 15 and 25 indi- 
viduals. Each of these ABoxes is realized w.r.t. the application TBoxes Bike7-9 
derived from a bike configuration task. The TBoxes especially vary on the degree 
of explicit disjointness declarations between atomic concepts. Figure 3 shows the 
runtimes for the realization of the ABoxes 1-5. Each ABox is realized with two 
different parameter settings. The first setting has the individual model merging 
technique enabled, the second one has it disabled. The comparison between both 
settings reveals a speed gain of at least one order of magnitude if the individual 
model merging technique is used. Note the use of a logarithmic scale. 

^ Any role assertion of the form (a, b) : R G A implies that 3>r R G Mf. This takes care 
of implied at least restrictions due to the UNA for old individuals. 
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Fig. 3. Bike ABoxes: Evaluation of model merging techniques (runtime in seconds, 2 
runs for each ABox, left-right order corresponds to top-bottom order in the legend). 



4 Pseudo Models for Reasoning with Concrete Domains 

The requirements derived from practical applications of DLs ask for more expres- 
siveness w.r.t. reasoning about objects from other domains (so-called concrete 
domains, e.g. for the real numbers). Thus, in [4] the logic ACCN'Hr+ is extended 
with a restricted form of reasoning about concrete domains. However, the clas- 
sification of non-trivial TBoxes is only feasible, if the model merging technique 
can be applied. Therefore, we extend the model merging technique to the basic 
DL with concrete domains, the language ACC{T>) [1]. We conjecture that the 
results from this approach can be directly transferred to the logic presented in 
[4]. First, we have to briefly introduce ACC{'D). 



4.1 The Language ACC{T>) 

A concrete domain 21 is a pair (Z\x>,<?x)), where Ax> is a set called the domain, 
and is a set of predicate names. Each predicate name Px> from is asso- 
ciated with an arity n and an n-ary predicate Px> C Z\^. A concrete domain 
V is called admissible iff (1) the set of predicate names is closed under 
negation and <Pxi contains a name Tjy for Ajy; (2) the satisfiability problem 
p;‘(xii, . . . ,XipJ A ... A P|;"'(x,^i, . . . ,Xnin„) is decidable (m is finite, P"' G <Pt>, 
and Xjk is a name for an object from 2\x>). 

Let S and F {R = S U F) he disjoint sets of role and feature names, respec- 
tively. A composition of features (written Fi • • • F„) is called a feature chain. A 
simple feature is a feature chain of length 1 . Let C be a set of concept names 
which is disjoint from R. Any element of C is a concept term. If C and D are 
concept terms, R G A is a role or feature name, Pg d>T> is a predicate name from 
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an admissible concrete domain, Ui’s are feature chains, then the following expres- 
sions are also concept terms: C □ D, C U D, ->C, V R . C, 3 R . C, 3 ui, . . . , Un . P. A 
concept term of the last kind is called predicate exists restriction. 

An interpretation Xx> = consists of a set (the abstract do- 

main), a set (the domain of an admissible ‘concrete domain’ V) and an 
interpretation function . Besides for feature and predicate names the interpre- 
tation function is defined as in Figure la. The function maps each feature name 
F from F’ to a partial function from A^ to A^ U A ^ , and each predicate name 
P from <Px' with arity n to a subset P^ of At^. For a feature chain u = Fi • • • Fp, 
denotes the composition Fi^ o • • • o F„^ of partial functions Fi^, . . . ,Fp^. Let 
ui, . . . , Up be feature chains and let P be a predicate name. Then, the interpre- 
tation function can be extended to concept terms as in Figure la. The semantics 
for the predicate exists restrictions is given by: 

(3 Ui, . . . , Un . P)^ := { a e I 3xi, A® : (a, xi) € ui^^, . . . , (a, G Up^, 

{xi, . . . ,x„) e P^} 

Note that in a concept term elements of A^ can be used only as feature fillers. 

A TBox T is a finite set of non-cyclic axioms of the form A C D or A = D 
where A must be a concept name. An interpretation I is a model of a TBox 'T 
iff it satisfies A^ C (A^ = D^) for all A C D (A = D) in T. 

An ABox A is a finite set of assertional axioms which are defined as fol- 
lows: Let O be a set of individual names and let A be a set of names for 
concrete objects (A n O = 0). If C is a concept term, R G F, F G F, a, b G O 
and x,xi, . . . ,Xp G A, then the following expressions are assertional axioms: a : C, 
(a,b):R, (a,x):F and (xi, . . ., Xp) : P. 

The interpretation function additionally maps every individual name from O 
to a single element of A^ and names for concrete objects from A are mapped to 
elements of A^ . (The UNA does not necessarily hold in ACC{T>).) An interpre- 
tation satisfies an assertional axiom a:C iff a^ G CA , (a,b):R iff (a^, b^) G R^, 
(a,x) : F iff (a^,x^) G F^, and (xi, . . .,Xp) : P iff (xi^, . . . ,Xp^) G P^. An interpre- 
tation I is a model of an ABox A w.r.t. a TBox T iff it is a model of T and 
furthermore satisfies all assertional axioms in A. 



4.2 Pseudo Models for TBox Reasoning in ACC{T>) 

By analogy to the previous sections, we assume a tableaux calculus which de- 
cides the ABox consistency problem for ACC{T>) (see [1]). The clash triggers in 
this calculus are the primitive clash, two triggers for feature fillers with member- 
ship to both domains, and one clash trigger indicating inconsistencies between 
concrete domain objects. 

In the following we assume the same naming conventions as used above. 
In order to obtain a flat pseudo model for a concept C the consistency of 
A ={a:C} is tested. If A is inconsistent, the pseudo model of C is defined as 
T. If A is consistent, then there exists a set of completions C. A completion 
A' G C is selected and a pmodel M for a concept C is defined as the tuple 
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{M^, M^, M'^ , , M'^^) using the following definitions (let u = Fi • • • Fn 

be a feature chain, then first{u) = Fi). 

= {A|a:A G = {A | a : ^A G 

= {R|a:3R.C G A'}, M'' = {R | a : V R . C G A'}, 

= {F|a:3F.C G ^'} U 

{F I F = /jrst(uj), Uj used in 3 ui, . . . , Un . P, a : 3 ui, . . . , Un . P G A'}, 

= {F|a:VF.C G ^'} 

Note that sets from a flat pseudo model for an ACC{V) concept contain only 
concept, role, and/or feature names. In order to correctly deal with the semantics 
of features, the pmodel also contains separate sets and . The set 
contains all feature names mentioned in exists restrictions and all feature names 
being first element of a feature chain in predicate exists restrictions, and the set 
contains all feature names mentioned in value restrictions. 

The following procedure ^£C(2?)-mergable implements the flat model 
merging test for ACC{V) for a given non-empty set of pmodels MS. 



Procedure 2 ^£C(2?)-mergable(M5') 
if T G MS V -iatoms_mergable(MS') then 
return false 
else 

for all pairs {Mi, M 2 } C MS do 
if {Mf n M^) / 0 V (M/ n M|) 7 ^ 0 then 
return false 

else if {Mf n Mf) / 0 V {Mf n Mf) / 0 V {Mf n Mf) / 0 then 
return false 
return true 



The idea of this test is to check for possible primitive clashes at the “root 
individual” of the pmodels in MS using atoms_mergable. Then the procedure 
^£C(2?)-mergable checks for possible references to the same direct role or 
feature filler by more than one pmodel in MS . 

This easy, but conservative test handles, besides primitive clashes, the three 
^£C(2?)-speciflc clash triggers, because they can only appear at feature flllers. A 
proof for the soundness of A£C(T’)-mergable can therefore be easily adapted 
from the one given in Section 2. Due to lack of space, we cannot present the 
model merging technique for deep pseudo models which is described in [9] where 
this technique is also extended for other DLs with concrete domains. Full proofs 
for flat and deep model merging for ACC{T>) can be found in [9]. 

5 Conclusion and Future Work 

In this paper we have analyzed optimization techniques for TBox and ABox 
reasoning in the expressive description logic ACCN'Hr+ . These techniques ex- 
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ploit the traversal of flat and/or deep pmodels extracted from ABox consistency 
tests. A moderate speed gain using deep models for classification of concepts 
and a dramatic gain for realization of ABoxes is empirically demonstrated. The 
model merging technique has also been investigated for the logic ACC{T>) with 
concrete domains. We conjecture that individual model merging for ACC{T>) can 
be developed in analogy to Section 3. The model merging technique for ACC{T>) 
is a prerequisite in order to apply model merging to ACCMT-Lr+ extended by 
concrete domains. 

It is easy to see that an enhanced version of the individual model merging 
technique for ACCMT-L}i+ can be developed, which additionally exploits the use 
of deep models. This is immediately possible if only ABoxes containing no joins 
for role assertions are encountered. In case an ABox A contains a join (e.g. 
{(a, c) : R, (b, c) : R} C A), one has to consider a graph-like instead of a tree-like 
traversal of pseudo models reflecting the dependencies caused by joins. 
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Abstract. We present an ExpTime decision procedure for the full /i- 
Calculus (including converse programs) extended with nominals and a 
universal program, thus devising a new, highly expressive ExpTime logic. 

The decision procedure is based on tree automata, and makes explicit the 
problems caused by nominals and how to overcome them. Roughly speak- 
ing, we show how to reason in a logic lacking the tree model property 
using techniques for logics with the tree model property. The contribu- 
tion of the paper is two-fold: we extend the family of ExpTime logics, 
and we present a technique to reason in the presence of nominals. 

1 Introduction 

Description Logics (DLs) are a family of knowledge representation formalisms 
designed for the representation of and reasoning about terminological knowl- 
edge [34,28,2]. Over the last years, they turned out to be also well-suited for 
the representation of and reasoning about, e.g., ontologies [31,16] and database 
schemata, where they can support schema design, evolution, and query optimi- 
sation [7], source integration in heterogeneous databases/data warehouses [6], 
and conceptual modeling of multidimensional aggregation [18]. 

The basic notions of DLs are concepts (classes, unary predicates) and roles 
(binary predicates) . A specific DL is mainly characterised by a set of constructors 
that allow to form complex concepts and roles from atomic ones. A standard DL 
knowledge base consists of two parts: in the TBox, the vocabulary of a given 
application domain is fixed. Some TBox formalisms only allow to introduce 
names for complex concepts, whereas others allow, additionally, to state general 
axioms such asC = DorCQD for two (possibly complex) concepts [11,22]. 
The second part of a DL knowledge base, the ABox, states facts concerning 
concrete individuals. Using the vocabulary fixed in the TBox, we can state in an 
ABox that the individual a is an instance of, e.g., the concept CMReactor, and 
that it is related via the role has-part to an individual b. Given such a “hybrid” 
knowledge base, interesting reasoning problems include the computation of the 
taxonomy (i.e., the hierarchy w.r.t. the subsumption relation) of those concepts 
defined in the TBox, finding inconsistent concepts defined in the TBox, and 
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finding, for an individual a in the ABox, the most specific concepts defined in 
the TBox that a is an instance of. 

To be of use in a specific application, a DL must provide the means to de- 
scribe properties of objects that are relevant for this application. Unsurprisingly, 
the more expressive power a DL provides, the more complex the reasoning al- 
gorithms for this DL are. As a consequence, a variety of DLs were introduced 
together with investigations of the complexity of the corresponding reasoning 
algorithms/problems (see, e.g., [26,34,13]). 

In 1991, Schild described the close relationship between DLs and modal log- 
ics or dynamic logics [32]. For example, it turned out that AdC is a notational 
variant of multi modal K. Following that, numerous new DLs with correspond- 
ing complexity results emerged by (extensions of) translations into modal and 
dynamic logics [9,33,10]. Due to its high expressive power, the full /x-calculus 
(i.e., propositional ^-calculus extended with converse programs) can be viewed 
as (one of) the “queens” of ExpTime modal/dynamic/ temporal logics [23,35, 
40]. It is able to capture, for example, converse-PDL, CTL*, and other highly 
expressive modal/dynamic/temporal logics, and thus also highly expressive DLs 
[5] . Unfortunately, the y^-calculus lacks two features that are of great importance 
for it being also a “queen” for DLs: it does not provide an analogue for concept 
definition/general axioms that are provided by TBoxes, and it has no equiva- 
lent to ABox individuals. The first point is not a serious one since we could 
“internalise” general axioms using a greatest fixpoint formula even though the 
/x-calculus does not provide (constructors to build) a universal program [32] . The 
second one is more serious since, for example, the extension of the y:i-calculus with 
individuals no longer has the tree model property. Moreover, in the presence of 
individuals, internalisation becomes more subtle. 

In this paper, we extend the y:i-calculus with a universal role/program to en- 
able direct internalisation of TBoxes [32] , and with a generalised form of ABox 
individuals, namely nominals, thus devising a logic where all standard inference 
problems concerning TBoxes and ABoxes can be reduced to satisfiability. In 
contrast to ABox individuals, nominals can be used inside complex formulae in 
the same place as atomic propositions. We are able to show that the complexity 
of the full ^-calculus, when extended with a universal program and nominals, 
does not increase, but remains in ExpTime. To prove this upper bound, we re- 
duce satisfiability to the emptiness of alternating automata on infinite trees — a 
family of automata that can be viewed as abstractions of tableau algorithms. 
This technique is rather elegant in that it separates the logic from the algorith- 
mics [39]. For example, a tableau-based algorithm might require sophisticated 
blocking techniques to guarantee termination [22]. Using the automata-theoretic 
technique, termination is not an issue since we can work on infinite trees. More- 
over, this technique makes explicit which problems arise when reasoning in the 
presence of nominals and universal roles, and how to deal with them. We have 
chosen to deal with nominals by explicitly guessing most of the relevant infor- 
mation concerning nominals — a choice that will be explained in the sequel. 
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Besides being of interest by itself and, once again, showing the power of 
the automata-theoretic approach, the complexity result presented here broadens 
the range description/modal/dynamic logics that have ExpTime decision proce- 
dures. Over the last few years, it was shown that tableau-based algorithms for 
certain ExpTime-complete reasoning problems are amenable to optimisation and 
behave quite well in practise [21,29,19,22]. Thus, establishing an ExpTime upper 
bound is a first step in developing a practical decision procedure for the hybrid 
/x-calculus or, at least, for fragments of this logic. We return to the practicality 
issue at the end of the paper. 

Unfortunately, this new “queen” logic is still not “the queen” since it is 
missing a prominent feature, namely number restrictions/graded modalities [17, 
12,38]. This is due to the fact that, in the presence of converse roles and universal 
programs/roles (or any other means to internalise axioms), nominals and number 
restrictions/graded modalities lead to NExpTime-hardness [37]. 

From the tense logic perspective [4] , the hybrid /r-calculus can also be viewed 
as one of the “queen” hybrid logics with ExpTime-complete reasoning problems: 
our result extends ExpTime-completeness results for, e.g., Priorean tense logic 
over transitive frames (which can be viewed as a notational variant of multi- 
modal K4 with converse modalities) or converse-PDL with nominals in [1]. 

2 Preliminaries 

In this section, we introduce syntax and semantics of the hybrid /x-calculus as 
well as two-way automata. It is the extension of the propositional /x-calculus 
with converse programs [40], a universal role, and nominals [30,1], i.e., atomic 
formulae to refer to single points. 

Definition 1. Let AP he a set of atomic propositions, Var a set of propositional 
variables, Norn a set of nominals, and Prog a set of atomic programs with the 
universal program o € Prog. A program is either an atomic program or the 
converse a~ of an atomic program a G Prog. The set of formulae of the hybrid 
/x-calculus is the smallest set such that 

— true, false, p and ~<p are formulae for p G AP U Norn, 

— X € Var is a formula, 

— if (fii and ip 2 «re formulae, a is a program, and x is a propositional variable, 

then Lpi A ip 2 , V ip 2 , (o) [cr] ipi, jJLX.Lpifx) and vx.ip\{x) are formulae. 

A propositional variable x G Var is said to occur free in a formula if it occurs 
outside the scope of a fixpoint operator. A sentence is formula that contains no 
free propositional variable, i.e., each occurrence of a variable x is in the scope of 
a fixpoint operator p, or v. We use A to denote a fixpoint operator p, or v. For a 
X-formula \x.ip{x), we write ip{Xx.ip{x)) to denote the formula that is obtained 
by replacing each free occurrence of x in (p with Xx.ip{x). 

Semantics is defined by means of a Kripke structure and, in the presence 
of variables and fixpoints, a valuation that associates a set of points with each 
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variable. Readers not familiar with fixpoints might want to look at [23,35] for 
instructive examples and explanations of the semantics of the y^-calculus. 

Definition 2. Semantics of the hybrid yL-calculus is given by means of a Kripke 
structure K = (W,R,L), where 

— W is a set of points, 

— R : Prog — ^ 2^^^ assigns to an atomic program a binary relation on W, 

— R{o) = WxW, and 

— L : AP U Norn — 2^ assigns to each atomic proposition or nominal the set 
of points in which it holds, such that L{n) is a singleton for each nominal n. 

R is extended to converse programs as follows: R{a~) = {{v,u) \ (u,v) G R{a)}. 

Given a Kripke structure K = {W,R,L) and variables Xi, . . . ,Xm, a val- 
uation V : ,Xm} — 2^ maps each variable to a subset of W . For a 

valuation V, a variable x, and a set of points W C W , V[a^/kPT is the valuation 
that is obtained from V by assigning W to x. 

A formula ip with free variables among x\,... ,Xm is interpreted over a 
Kripke structure K = (W,R,L) as a mapping that associates, with each 
valuation V, o, subset (p^(v) of W . This mapping is defined inductively as fol- 
lows: 

— true^(V) = W , false^(V) = 0 , 

~ for p G AP U Norn, we have p^{V) = L{p) and {~^p)^{v) = IR \ L{p) 

— {piAp2)^{v) = {pi)^{v)n{p2)^{v), 

{piVp2)^{V) = {pi)^{V)U{p2)^{V), 

{{a)ip)^{y) = {u&W \ there is av with{u,v) G R{a) and v G(/?^(V)}, 
([a] (/?)^(v) = {m G kb I for all v, (u,v) G R{a) implies v G 

— {tJLx.p{x))^lv) = r\{W C w I p^{v[x/W'\) C W'} 
lvx.iplx))^lv) = \SW C W I p^\v[x/W']) A W'} 

For a sentence ip, a Kripke structure K = (W,R,L), and w G W, we write 
K,w \= Ip iff w G ip^ , and call K a model of ip A A sentence that has a model 
is called satisfiable. 

Remark 1. All formulae are by definition in negation normal form, i.e., negation 
occurs only in front of atomic propositions or nominals. 

In the following, we will sometimes write ip{ni,... ,nf) to emphasize that 
ni , . . . ,ni are exactly the nominals occurring in ip. 

Since we will treat atomic programs and their converse symmetrically, we 
will use a to denote the converse of a program, i.e., a“ if a = a for some atomic 
program a, and b if a = b~ for some atomic program b. We use Prog,^ to denote 
all (possibly negated) programs occurring in ip. 

In many decidable hybrid logics, we find formulae of the form p@n (to be 
read as “ the formula tp holds at the nominal n” ) with the semantics 

^ The interpretation of a sentence is independent of valuations. 
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We did not provide this operator since, in the presence of the universal role o, 
we can make use of the equivalence Lp@n = [6\{->n V ip). 

We note that the formula \o]n is satisfied only by a structure with a single 
state. This formula cannot be expressed without the use of both nominals and 
the universal program. 

Finally, we introduce two-way alternating automata on infinite trees. This 
family of automata generalises non-deterministic tree automata in two ways: 
firstly, they allow for the rather elegant and succinct alternation [27], which 
allows for transitions such as “being in state q and seeing letter a, the automaton 
either has an accepting run with <71 from the left successor and an accepting run 
with Q 2 from the right successor, or it has an accepting run with q' from the left 
successor.” To express this kind of transitions, the transition functions involves 
positive boolean formulae instead of, e.g., sets of tuples of states as for non- 
deterministic automata. Secondly, being two-way allows runs to go up and down 
the input tree, similar to converse programs, which allow following programs in 
both directions. When running on a fc-ary tree, a two-way automaton can have 
transitions going to the zth child and switching to state q' (denoted (z, q') with 
1 < z < fc), staying at the same node switching to state q' (denoted {0,q')), or 
going to its (unique) predecessor and switching to state q' (denoted (— 1 ,( 7 ')). 
For an introduction to two-way alternating automata and their application to 
the full /z-calculus, see [40]. 

Definition 3. For k > 1 an integer, ({1, . . . , k}*, V) is a k-ary if-labelled tree 
if V is a mapping that associates, with each node x G {I,-. - ,k}* , its label 
V(x) € S. Intuitively, for l<i<k,x-i is the ith child of x. 

Let B^{X) be the set of positive Boolean formulae (i.e., formulae built using 
A and V only) over the set X. For X' C X, we say that X' satisfies a formula 
0 G B'^(X) iff assigning true to all elements in X' and false to all elements in 
X \X' makes 0 true. 

Let [k] = {—1,0,1,... ,k}. A two-way alternating automaton on k-ary E- 
labelled trees is a tuple A = {E,Q,S,qo, F), where Q is a finite set of states, 
qo € Q is the initial state, 5 \ Q x E ^ B^{[k] x Q) is the transition relation, 
and F is the acceptance condition. 

A run of A on a E -labelled k-ary tree (T, V) is a (T x Q) -labelled tree (T^, r) 
that satisfies the following conditions: 

— e GTj. and r(e) = (e, qo), 

— If y (z Tr with r{y) = (x,q) and 6{q,V{x)) = 0, then there is a (possibly 
empty) set S' C [A:] x Q that satisfies 0 such that, for each {c,q') G S, there 
is a node y ■ i €Tr satisfying the following conditions: 

— If c= e, then r{y ■ i) = {x, q') . 

~ If c> 1, then r{y ■ i) = {x ■ c, q') . 

— If c= —1, then X = x' ■ i for some 1 < i < k, and r{y ■ i) = {x' , q'). 

A run {Tr, r) is accepting iff all its infinite paths satisfy the acceptance condition. 
Since we use tree automata for the p,-calculus, we consider the parity condition 
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[36]. A parity condition is given by an ascending chain of states of sets F = 
(Fq, . . . ,Ffc) with Fi C Fi+i. Given a path P in (Tr,r), let inf(P) denote the 
states that are infinitely often visited by P. Then P is accepted iff the minimal 
i with inf(P) fl Fi 0 is even. 

For two-way alternating automata, the emptiness problem is the following 
question: given a two-way alternating automaton A, is there a tree (F, V) such 
that A has an accepting run on {T,V )7 It was shown in [40] that this problem 
is solvable in time that is exponential in the number of A’s states, where the 
exponent is a polynomial in the length of the parity condition. 



3 Hybrid ^t-Calculus Has a Tree Model Property 

As usual, when proving a tree model property for the hybrid /i-calculus, we want 
to “unravel” a given model to a tree model. In the presence of nominals, this 
is clearly not possible since, for example, the formula n A (a)(m A (P)n) with 
n,m G Norn has no model in the form of a tree. However, we will show that we 
can unravel each model to a forest, i.e., a collection of trees. When unravelling, 
we must choose “good” points that witness diamond formulae (i.e., a point y with 
y G (p^ and {x, y) € R{a) for x € {{a) p )^) — where being “good” is rather tricky 
in the presence of fixpoints. To this purpose, we define a choice function that 
chooses the “good” witnesses. Essentially, this choice function is a memoryless 
strategy whose existence is guaranteed for parity games [14]. Definition 4 is the 
extension of the standard ones to nominals, see, e.g., [35,40]. 

Definition 4 . The closure cl(^/>) of a sentence if is the smallest set of sentences 
that satisfies the following: 

— G cl(V’), 

— if Pi A p2 G cl(^/>) or pi^ P2 G cl('i/'), then {pi,p2} C Aff), 

— if (a) p G cl (if) or [a] p G cl('0), then p G cl(if), and 

— if Xx.p(x) G cl(if), then p(\x.p(x)) G cl(if). 

An atom A C cl(if) of if is a set of formulae that satisfies the following: 

— if p G AP U Norn occurs in if, then, exclusively, either p G A or ->p G A, 

— if Pi A p2 G c\(if), then A (/?2 G A iff{pi,p2} C A, 

— if Pi V p2 G c\(if), then pi V p2 G A iff {pi, P2} D A yf 0, and 

— if Xx.p(x) G cl(if), then Xx.p(x) G A iff p(Xx.p(x)) G A. 

The set of atoms of if is denoted Sit{if). 

A pre-model (K, tt) for a sentence if consists of a Kripke structure K = 
(IF, R, L) and a mapping tt : IF — > at('!/;) that satisfies the following properties: 

— there is a uq GW with if G tt(uq), 

— for p G AP U Norn, if p G tt(u), then u G L(p), and if ~<p G tt(u), then 
u ^ L(p),'^ 

Hence if a nominal n is in 7r(ti), then L(n) = {n}. 



2 
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— if (a) (p € 7t(u), then there is a v & W with (u, v) € R(a) and ip € 7r(v), and 

— if [a] ip G 7t(m), then ip G t:{v) for each v & W with (u,v) G R{a). 

A choice function ch : W x c\{tp) — U W for a pre-model {K,n) of if 
is a partial function that, for each u G W , 

(i) if V (^2 G then ch(i6, ipi V ip 2 ) G {ipi, ip 2 } H tt{u) and 

(ii) if {a) ip G 7t(m), then ch{u,{a)ip) = v for some v with (u,v) G R{a) and 
ip G 7t(u). 

An adorned pre-model (it', 7r,ch) consists of a pre-model and a choice 

function ch . 

For an adorned pre-model {W,R,L,'k,c\\) of if, the derivation relation C 
{c\{if),WY is defined as follows: 

— if ipi \/ (p 2 & tt{u), then {(pi V ip 2 ,u) (ch((/?i V (p 2 ),u) 

— if ipi A (p 2 € 7t(u), then {ipi A ip 2 ,u) {ipi,u) for each i G {1,2}, 

— if (a) ip G 7t(m), then ((a) ip, u) ^ {(p, ch((a) ip, u)) 

— if [a] ip G 7t(u), then ([a] ip, u) {ip, v) for each v with {u, v) G R{a) 

(for a = o, that means that ([o] ip, u) ^ {ip, v) for each v G W ) 

— if \x.ip{x) G 7t(u), then (Ax.(p(x),u) ^ ((p(Ax.(p{x)),u) 

A least-fixpoint sentence jjLx.ipfx) is said to be regenerated from point u to point 
V in an adorned pre-model (iG,7r,ch) if there is a sequence {pi,ui), . . . , {pk,Uk) 
with k > 1 such that pi = Pk = px.(p{x), u = ui and v = Uk, for each 1 < 
i < k, we have {pi,Ui) {pi+i,Ui+i), and px.ipfx) is a sub-sentence of each 
Pi- We say that (if, 7r,ch) is well-founded if there is no least fixpoint sentence 
px.ip{x) G c\{if) and an infinite sequence uq,ui,... such that, for each i > 0, 
px.ip{x) is regenerated from Ut to Mj+i- 

Lemma 1. A sentence if has a model K iff if has a well-founded adorned pre- 
model {K, TT, ch). 

Proof. The construction of a model from a well-founded adorned pre-model and, 
vice versa, of a well-founded adorned pre-model from a model, are analogous to 
the constructions that can be found in [35]. These constructions are, as men- 
tioned in [40], insensitive to converse programs, and — due to the according mod- 
ifications of the technical details — also insensitive to nominals. Indeed, nominals 
behave simply like atomic propositions provided that L{n) is guaranteed to be 
interpreted as a singleton. □ 

Definition 5. The relaxation of a pre-model {W, R, L,tt) of a sentence if{n\, 

. . . ,ni) consists of mappings R^ and tt’’, where 

R^ : Prog W X W and 

R'" : a !->■ R{a) \ |(m, v) \ for some I < i < i, L{m) = {u|} 

^ (G I G = Gi UG2, Gi G at(V’), and 

G 2 C {-^Ui I a occurs inif,af^ o, and 1 < z < £}} 
tt'' : u I— >■ 7r(u) U |-^n | {u,v) G R{a), af^o, and L{n) = {n}} 

A relaxation is a forest if RJ" forms a forest. 
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Lemma 2. If a sentence ip is satisfiable, then it has a well-founded adorned 
pre-model whose relaxation is a forest and has ip in the label of one of its roots. 

Proof. Let ip be satisfiable. Hence there is a well-founded adorned pre-model 
{K, 7T, ch) with K = {W, R, L) for ip due to Lemma 1. Using a technique similar 
to the one in [40], we construct from (K, tt, ch) a well-founded adorned pre-model 
(iL', 7 t', ch') whose relaxation is a forest. Please note that, due to the presence 
of converse programs, we cannot simply unravel K. However, we can use the 
choice function to do something similar that yields the desired result also in the 
presence of converse programs. 

Let Ip = ip{ni,... ,nt) and wq & W such that wq G ip^ . Let \ip\ = n, let 
(oi) (fii, . . . , (ak) <Pk' be all diamond formulae in cl('0), and let k be the maximum 
of k' and lp\. Hence we have k < n. We define a mapping r : {!,... , k}~^ — >■ 
W U {_L} inductively, together with an adorned pre-model (iL', tt', ch') where 
K' = (W' , R' , L'), W' = dom(T) \ {x \ t(x) = _L}, and 

— for p G AP U Norn, x € L'{p) iff t{x) G L{p), 

— 7t'(x) = 7t(t(x)), 

— ch'(x, Pi V P 2 ) = ch(r(a;). Pi V P 2 ), and 

— R' and ch'(x,p) for diamond formulae p are defined inductively together 
with r. 

(Fix the first level) For j with 1 < j < £, let ^/(i), • ■ • G lU be such 

that Lfrij) = {r’/(j)} and /(I) < ••• < f{£) < £ — since it is possible that 
L(n) = L{n') for nominals n n' , f need not be injective. For 1 < j < i, 
set r(/(j)) = u/(j). 

For Wo GW with wq G ip^ , if Wq ^ {^’/(i), ■ • • , then set T{f{£) -h 1) = 

Wq. Set t(j) = _L for each 1 < j < k not yet defined. 

(Fix the rest) For the induction, let i be such that t{x) is already defined for 
each a; G {1, . . . , fc}*, and j with 1 < j < A: such that t{x1), . . . , T{x{j — 1)) 
is already defined for each a; G {1, . . . , fc}L Then, for each a: G {1, . . . , /c}*, 
do the following: 

(1) if (aj) (fij ^ 7t'(x) or t{x) = _L, then define r(xj) = _L. 

(2) if {aj)ipj G 7r'(a:), then (since (iL, tt, ch) is a pre-model and 7r'(a;) = 
7r(r(a;))), there is some v GW with ch(r(a;), (a^) p^) = v and (r(a;),u) G 
R{aj). 

— If {u} = L{ni') for some !<£'<£, then (since we have already fixed 
the first level) there is some r with 1 < r < £ with r(r) = v. Add (x, r) 
to R'(aj), and set ch'(a;, (aj) ipj) = r and T{xj) = _L. 

— Otherwise, add (x,xj) to R'{aj), set t{xJ) = v and ch (a;, {aj) (pj) = xj. 

Since we started from an adorned pre-model, (K' , tt', ch') is obviously an adorned 
pre-model. Moreover, if a sentence fj.x.ip{x) is regenerated from x to p in {K' , tt', 
ch'), then px.ipfx) is also regenerated from r(x) to T{y) in (AT, 7r,ch). Since the 
latter is well-founded, we thus have that (AT', tt', ch') is well-founded. Next, its 
relaxation is a forest (consisting of trees starting at the first level) since the 
only edges in R' that “go back”, i.e., that are not of the form (x, xi), are exactly 
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those that are eliminated in R''^ . Finally, ip is satisfied in one of the root nodes 
since, by definition of {K' ,tt' , cW), we have j G for some 1 < j < f{£) + 1. 

□ 

Remark 2. Please note that in this construction, if x satisfies a diamond formula 
(a) ip, then either a successor xj of x or one of the first level nodes representing 
nominals satisfies (p. 

4 Deciding Existence of Forest Models 

It remains to devise a procedure that decides, for a sentence ipj whether it has 
a well-founded adorned pre-model whose relaxation is a forest. To this purpose, 
we define a two-way alternating tree automaton that accepts exactly the forest- 
relaxations of ^p’s pre-models — provided that we added a new dummy node whose 
successors are the root nodes of the forest relaxation. 

The automaton depends on a guess which contains relevant information con- 
cerning the interpretation of nominals. The guess makes sure that the following 
kind of situation is handled correctly: suppose a nominal n must satisfy a formula 
of the form [a] p, and we have a point x with (a;, n) G R{a), but this relationship 
is only implicit since we work on relaxations of pre-models, i.e., (x,n) ^ R''{a) 
and An G 7 r’'(a;). In that case, the guess makes sure that x satisfies p since it 
determines which box formulae are satisfied by nominals. Moreover, the guess de- 
termines which nominals are interpreted as the same objects, and how nominals 
are related to each other by programs. 

It is possible to refer all this “guessing” directly to the automaton — hence we 
had only one automaton instead of one per guess. We have chosen, however, to 
work with explicit guesses since, on the one hand, it makes explicit the additional 
non-determinism one has to cope with in the presence of nominals and how it can 
be dealt with. On the other hand and more importantly, referring the guessing 
into the automaton would yield a quadratic blow-up of the state space. Let n 
be the number of states and m be the length of the acceptance condition of 
a two-way alternating tree automaton. When deciding emptiness of a two-way 
alternating tree automaton [40] , it is transformed into a non-deterministic (one- 
way) parity tree automaton whose state space is of size , and whose 

acceptance condition is of length nw?. Emptiness of the latter automaton can 
be decided in time ™ )(i°gn+ 2 iogm)) |- 25 j^ Hence a (quadratic) blow-up of 

the state space of our initial two-way alternating tree automaton would further 
increase the degree of the polynomial in the exponent of the runtime, and thus 
be rather expensive. 

Formally, a guess consists of three components, the first one consisting, for 
each nominal n, of a set 7 of formulae satisfied by a point u with L(n) = {u}. 
Since one point may represent several nominals, we use a second component / to 
relate a nominal rii to “its” set of formulae 7/(i). The third component describes 
how two points representing nominals are interrelated via (interpretations of) 
programs, making sure that, if one is an a-successor of the other, then the other 
is an a-successor of the first one. 
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Definition 6. A guess Q = (G,f,C) for a hybrid ^-calculus sentence 
. . . ,ni) consists of a guess list G = ( 71 , . . . , 7 ^) together with connections G C 
Norn X Prog^ x Norn and a guess mapping ,£} — >• ,£}, where, 

for each 1 < i,j < £, we have 0 C 7 ^ C cl('i/;) or 74 = _L, G 7/(i)> ^ Ti 

all j ^ f{i), Nomri 7 i = 0 implies 7 ^ = _L, and (ni,a,nj) G G iff {nj,a,ni) G C. 

Theorem 1. Let if he a hybrid yi-calculus sentence. For each guess Q for fj, we 
define a two-way alternating tree automaton B{ip,G), such that 

1. if if is satisfiable, then there exists a guess Q' for ip such that the language 
accepted by B{ip,G') is non-empty, 

2. if a tree is accepted by B{ip,Q), then eliminating its root node yields a forest 
relaxation of a well-founded adorned pre-model of ip, and 

3. the number of B{ip, G) ’s states is linear in \ip\. 

Proof. For ease of presentation, we assume that all input trees are full trees, i.e., 
all non-leaf nodes have the same number of children. As we have seen in the 
proof of Lemma 2, we can simply “fill” a tree with additional nodes labelled _L 
to make it a full tree. Moreover, we assume a “dummy” root node whose direct 
successors are exactly the root nodes of trees in the forest relaxation. 

For a sentence ip{n\, . . . , ni) with k' diamond subformulae in c\{ip) as speci- 
fied in the proof of Lemma 2 and a guess G, we define two alternating automata, 
A{ip, G) and A{ip, G), and then define B{ip, G) as the intersection of A{ip, G) and 
A{ip, G). For alternating automata, intersection is trivial (basically, we introduce 
a new initial state q with 6{q,a) = (0, go) ^ (0;9o) former initial states 

go, q'o), and the size of B{ip, G) is the sum of the sizes of A{ip, G) and A{ip, G). 

The automaton A{ip, G) is rather simple and guarantees that the structure of 
the input tree is as required, whereas A{ip,G) really makes sure that the input 
tree (more precisely, the sub-forest of the input tree obtained by eliminating the 
root and all nodes labelled with _L) is a relaxation of a well-founded adorned 
pre-model. 

Both automata work on the same alphabet S, which is defined as follows: 
For Prog^ = {pa,P>a,Pa^Pa | a is a program in ip different from o}, 

27={_L, root)j{cr | a C AP U Norn U Prog^ U {Ani | 1 < j < m and 1 <i < f\, 
a contains, for each a, exclusively, either pa or p^, and, 
exclusively, either p^ or p^} 

The intuition of the additional symbols are as follows: Nodes not representing 
points in a Kripke structure are labelled root and _L, where root labels the root 
node. Nodes having (i.e., the node labelled with the corresponding guess 7/(q) 
as an a-successor are marked Aui, just like in relaxations. A node label contains 
Pa (Pa) if this node is an a-successor (a-successor) of its (unique) predecessor. 
We do allow that a node is both an a- and a /3-successor, or that no program 
can be associated to the edge between two nodes. Analogously, p^ (pG) are used 
to mark those nodes that are not a-successors (a-successors) . 
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The “simple” automaton G) guarantees that root is only found at the 
root label, the nominals in 7 ^ are only found at the zth successors of the root, the 
first level nodes contain no Pa or and that, if a nominal has another nominal 
rij as its a-successor (i.e., if -^Uj is in the label of the node representing rii), then 
rij has Ui as its a-successor (i.e., jg jn the label of the node representing 

Uj). More precisely, M(z/>,t/) = (T", {go, 9 i, ■ • ■ , ?', <?}, <5', <?o) is a safety one- 

way alternating automaton (i.e., each state is accepting and thus every run is 
an accepting run), and 5' is defined as follows for a £ S\ 



^'(90, ct) = <j 


[ false 


^'( 9 ', O') = j 


f true 


[ false 


for 1 <i < ^ 

1 

1 


fAti( 


II 


1 

[ false 


II 


^ AtrC 

[ false 



otherwise 



for each n G Norn fl cr and (n, a, n') G C, ^n' € a 
otherwise 

if (j n Norn = 0 and root yf cr 
otherwise 



Due to the symmetry in the definition of the connection component in a guess 
and the way S'{qi,a) is defined, if A{tp,G) accepts a tree, (nj,— >-ny} C a, and 

rij G cr', then G a', and a, a' label direct successors of the root node. 

The two-way alternating tree automaton A{ip, G) verifies that the input tree 
is indeed a relaxation of a well-founded adorned pre-model. To this purpose, 
(most of) its states correspond to formulae in cl(' 0 ), and the transition relation 
basically follows the semantics. 

The first conjunct in the definition of 5(gQ, cr) guarantees that the zth succes- 
sor of the root node indeed satisfies all formulae in ji, and that one of the root 
node successors satisfies ijj. 

An additional state q' that “travels” once through the whole input tree makes 
sure that, whenever a node has a nominal zz^ as its implicit a-successor (i.e., 
its label contains A rzj), then this node satisfies indeed all formulae p with 
[a]v^ lf(i)- 

Finally, the diamond and box formulae on the universal role are treated 
separately since they apply to all but the root node, regardless of marks Pa or 
Pq,. Please note that, since the root node does not represent any point of a Kripke 
structure, 5([o] p, root) is defined such that only all root successors satisfy [o] p, 
but not the root node itself. More precisely, we have 



A{ii^,G) = (A,Q,(5,g(),F), with 

Q = l-L,go,g'} ucl(V') u Prog+. 
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The transition relation S is defined as follows: firstly, for g G Q and a G S 
let 



x( n _ / true if <7 = _L 
false otherwise 

Secondly, for 1 < i < f and a G U, let 

= if7i = ^ 

I A,,,G7i (*’ v?) ifTiCcKV') 




true if cr = _L 
false otherwise 



N{cr)= ^ A (0>7’) 
-^rii G o and 
[a] V G 7 /(i) 



Thirdly, for a G S, cr A -L> and a a program, we define 6 as follows: 

^(9o>cr) = ALi-^(*) ^ V*Ai(z,V') A ALAA,?') V (t,T)) 

S{q',a) = N{a)A/\l,{{z,q')V{t,±)) 



for p G AP U Norn U Prog^ : 

'i _ / true if p G CT 
[Pj “i f^jgg otherwise 



for p G AP U Norn : 

'i _ / true if p ^ cr and cr yf root 
^ I false otherwise 
S((pi A P 2 ,cr) = (0,(pi) A (0,(P2) 

5((pi V P 2 ,cr) = (0,(pi) V (0,(P2) 

S(Ax.(p(x), a) = (0, (p(Ax.(p(x))) 



for a ^ {o,o } : 
6{{a)ip,a) = 
for a ^ {o, o”} : 

5([a] if, a) = 

for a G {o,o~} : 
S{{a)(p,a) = 
for cr G {o, 0 “} : 



true if -Arij G cr and ip G 

Vy=i(0'>‘P) A O', Pa)) otherwise 

false if -^rii G a and tp ^ lf(i) 

((— 1, p) V (0,p^)) A otherwise 

Ay=i((j» V {j,pj V (j,T)) 

true if p G 7 /(i) 

Vj=i(A T’) otherwise 



{ (0, p) A (—1, [a] (p) A if root A cr 
Aj=i((j5 [cc] T’) V 0, -L)) 

Ay=i((j. H <P) V (j,T)) otherwise 

Please note that, following the construction in the proof of Lemma 2, satis- 
faction of diamond formulae (including those on the universal program) needs 
to be tested for only in direct successors and in the nodes representing nominals. 
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Moreover, since ip = , ni) and due to the definition of 6{qQ, a) and r{i), 

5 checks whether the node representing ni satisfies indeed all formulae in 7/(i). 

The acceptance condition F is defined analogously to the one in [15,24], and 
given here for the sake of completeness. Firstly, for a fixpoint formula (p G cl (')/)), 
define the alternation level of p to be the number of alternating fixpoint formulae 
one has to “wrap p with” to reach a sub-sentence of ip. More precisely, the 
alternation level oi p = Xx.(p'{x) G cl(^) is defined as follows [3]: if tp 

is a sentence, then al^((p) = 1. Otherwise, let p = \'y.p'{y) be the innermost 
fixpoint formula in cl('0) that contains as a proper sub-formula. If A = A', then 
al^(i^) = al^(p), otherwise 3\p{p) = s\p{p) -|-1. Let d be the maximal alternation 
level of (fixpoint) subformulae of ip, and define 

Gi = {vx.p{x) G c\{ip) I a\.^{i/x.p{x)) = i} 

Li = {px.p{x) G c\{ip) I a\.^{p,x.p{x)) < i} 

Now we are ready to define the acceptance condition F = {F’l, . . . ,^ 2 ^} with 
Fi = 0 for i = 0, Fi = Fi_i U Li for odd i > 1, and Fi = Fi_i U Gi for even 
i > 1 . Obviously, Fi C Fi+i for each 1 < i < 2d. As mentioned in Definition 3, 
a path Tp of a run r is accepting if the minimal i with inf (rp) O Fi 0 is 
even — this i corresponds to the outermost fixpoint formula that was infinitely 
often visited/postponed. A run r is accepting if each of its paths are accepting. 
Intuitively, the acceptance condition makes sure that, if a fixpoint formula was 
visited infinitely often, then this was a greatest fixpoint formulae, and that all 
of its least fixpoint super-formulae were visited only finitely many times. 

It remains to verify the three claims in Theorem 1. The proof of the first 
one uses Lemma 1 and a straightforward construction of a guess Q from a forest 
relaxation of a well-founded adorned pre-model, and then shows how an input 
forest similar to the one constructed in the proof of Lemma 1 is accepted by 
B{ip, G). The second claim can be proved by taking an accepting run of B{ip, G) 
on some input tree, and verifying that the input tree indeed satisfies all properties 
of relaxations of well-founded adorned pre-models. Finally, the third claim is by 
definition of t/). □ 

Theorem 2. Satisfiability of hybrid p,-calculus is decidable in exponential time. 

Proof. As we have mentioned in the beginning of Section 4, emptiness of B{ip, G) 
can be decided in time 2®^” i°g") for n = jf/'l- Let £ be the number of nominals 
and m the number of programs different from o in ip. Since, for a guess G = 
(G, /, G), the mapping / is determined by G, the number of guesses is bound by 
the number of connections and guess lists, i.e., by 2^ Hence we have to test 

at most an exponential number of automata B{ip,G) for emptiness. Combining 
these results with Lemma 1, Lemma 2, and Theorem 1 concludes the proof. □ 

5 Conclusion 

We have shown that satisfiability of the hybrid y^-calculus can be decided in 
exponential time, thus partially answering an open question in [5]. Deciding 
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satisfiability of a logic that lacks the tree model property using tree automata 
was possible using a certain abstraction of models, relaxations, and involved an 
additional non-determinism, guesses. Then, we were able to use the emptiness 
algorithm in [40] as a sub-routine. For an input sentence, the algorithm presented 
constructs a family of tree automata, each of which depends on a guess that 
determines relevant information concerning the interpretation of nominals. We 
have chosen this explicit guess since, on the one hand, it directly shows how 
nominals can be dealt with. On the other hand, when referring the guessing into 
the automaton, we would blow up its state space quadratically. Since deciding 
emptiness of this family of automata is exponential in the size of its state space, 
it is clearly preferable to avoid even such a polynomial blow-up. The complexity 
of the hybrid /i-calculus with deterministic programs^ remains an interesting 
open problem. As a consequence of NExpTime-hardness results in [37], this 
extension leads to NExpTime-hardness. Another interesting research problem is 
the development of practical decision procedures for (fragments of) the hybrid 
/x-calculus. To the best of our knowledge, automata-theoretic methods are the 
only known methods for the y^-calculus, and, so far, such methods have been 
implemented successfully only for linear temporal logic, see, e.g., [8,20]. 



References 

1. C. Areces, P. Blackburn, and M. Marx. The computational complexity of hybrid 
temporal logics. Logic Journal of the IGPL, 8(5), 2000. 

2. F. Baader and B. Hollunder. A terminological knowledge representation system 
with complete inference algorithm. In Proc. of PDK-91, vol. 567 of LNAI. Springer- 
Verlag, 1991. 

3. G. Bhat and R. Cleaveland. Efficient local model-checking for fragments of the 
modal /i-calculus. In Proc. of TACAS, vol. 1055 of LNCS. Springer- Verlag, 1996. 

4. P. Blackburn. Nominal tense logic. Notre Dame Journal of Formal Logic, 34, 1993. 

5. D. Calvanese, G. De Giacomo, and M. Lenzerini. Reasoning in expressive descrip- 
tion logics with fixpoints based on automata on infinite trees. In Proc. of LJCAP99, 
1999. 

6. D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Description 
logic framework for information integration. In Proc. of KR-98, 1998. 

7. D. Calvanese, M. Lenzerini, and D. Nardi. Description logics for conceptual data 
modeling. In Logics for Databases and Information Systems. Kluwer Academic 
Publisher, 1998. 

8. E.M. Clarke, O. Grumberg, and K. Hamaguchi. Another look at LTL model check- 
ing. In Proc. of CAV’94, vol. 818 of LNCS, pages 415-427. Springer- Verlag, 1994. 

9. G. De Giacomo and M. Lenzerini. Boosting the correspondence between description 
logics and propositional dynamic logics. In Proc. of AAAI-94, 1994. 

10. G. De Giacomo and M. Lenzerini. Concept language with number restrictions and 
fixpoints, and its relationship with /r-calculus. In Proc. of ECAI-94, 1994. 

11. G. De Giacomo and M. Lenzerini. Tbox and Abox reasoning in expressive descrip- 
tion logics. In Proc. of KR-96. Morgan Kaufmann, 1996. 

® Or Description Logic’s number restrictions or Modal Logic’s graded modalities. 




90 



U. Sattler and M.Y. Vardi 



12. F. Donini, M. Lenzerini, D. Nardi, and W. Nutt. The complexity of concept 
languages. In Proc. of KR-91. Morgan Kaufmann, 1991. 

13. F. M. Donini, M. Lenzerini, D. Nardi, and W. Nutt. The complexity of concept 
languages. Information and Computation, 134, 1997. 

14. E. A. Emerson and C. S. Jutla. Tree automata, /r-calculus, and determinacy. In 
Proc. of FOCS-91. IEEE, 1991. 

15. E. A. Emerson, C. S. Jutla, and A. P. Sistla. On model checking for fragments of 
the /x-calculus. In Proc. of CAV’93, vol. 697 of LNCS. Springer- Verlag, 1993. 

16. D. Fensel, I. Horrocks, F. van Harmelen, S. Decker, M. Erdmann, and M. Klein. 
OIL in a nutshell. In Proc. EKAW-2000, vol. 1937 of LNAI, 2000. Springer- Verlag. 

17. K. Fine. In so many possible worlds. Notre Dame J. of Formal Logics, 13, 1972. 

18. E. Franconi and U. Sattler. A data warehouse conceptual data model for multidi- 
mensional aggregation: a preliminary report. AI*IA Notizie, 1, 1999. 

19. V. Haarslev and R. Moller. Expressive abox reasoning with number restrictions, 
role hierarchies, and transitively closed roles. In Proc. of KR-00, 2000. 

20. Gerard J. Holzmann. The spin model checker. IEEE Trans, on Software Engineer- 
ing, 23(5), 1997. 

21. I. Horrocks. Using an Expressive Description Logic: FaCT or Fiction? In Proc. of 
KR-98, 1998. 

22. I. Horrocks, U. Sattler, and S. Tobies. Practical reasoning for very expressive 
description logics. Logic Journal of the IGPL, 8(3), May 2000. 

23. D. Kozen. Results on the propositional /r-calculus. In Proc. of ICALP’82, vol. 140 
of LNCS. Springer- Verlag, 1982. 

24. O. Kupferman and M. Y. Vardi. /r-calculus synthesis. In Proc. MFCS’OO, LNCS. 
Springer- Verlag, 2000. 

25. O. Kupferman and M.Y. Vardi. Weak alternating automata and tree automata 
emptiness. In Proc. of STOC-98, 1998. 

26. H. Levesque and R. J. Brachman. Expressiveness and tractability in knowledge 
representation and reasoning. Computational Intelligence, 3, 1987. 

27. D. E. Muller and P. E. Schupp. Alternating automata on inhnite trees. Theoretical 
Computer Science, 54(1-2), 1987. 

28. B. Nebel. Reasoning and Revision in Hybrid Representation Systems. LNAI. 
Springer- Verlag, 1990. 

29. P. F. Patel-Schneider and I. Horrocks. DLP and FaCT. In Proc. TABLEAUX-99, 
vol. 1397 of LNAI. Springer- Verlag, 1999. 

30. A. Prior. Past, Present and Future. Oxford University Press, 1967. 

31. A. Rector and I. Horrocks. Experience building a large, re-usable medical ontology 
using a description logic with transitivity and concept inclusions. In Proc. of the 
AAAI Spring Symposium on Ontological Engineering. AAAI Press, 1997. 

32. K. Schild. A correspondence theory for terminological logics: Preliminary report. 
In Proc. of IJCAI-91, 1991. 

33. K. Schild. Terminological cycles and the propositional /r-calculus. In Proc. of 
KR-94, 1994. Morgan Kaufmann. 

34. M. Schmidt-Schaufi and G. Smolka. Attributive concept descriptions with comple- 
ments. Artificial Intelligence, 48(1), 1991. 

35. R. S. Street! and E. A. Emerson. An automata theoretic decision procedure for 
the propositional /r-calculus. Information and Computation, 81(3), 1989. 

36. W. Thomas. Languages, automata, and logic. In Handbook of Formal Language 
Theory, vol 1. Springer- Verlag, 1997. 

37. S. Tobies. The complexity of reasoning with cardinality restrictions and nominals 
in expressive description logics. J. of Artificial Intelligence Research, 12, 2000. 




The Hybrid /r-Calculus 



91 



38. S. Tobies. PSPACE reasoning for graded modal logics. J. of Logic and Computa- 
tion, 2001. To appear. 

39. M. Y. Vardi. What makes modal logic so robustly decidable? In Descriptive 
Complexity and Finite Models, American Mathematical Society, 1997. 

40. M. Y. Vardi. Reasoning about the past with two-way automata. In Proc. of 
ICALP’98, vol. 1443 of LNCS, 1998. Springer- Verlag. 




The Inverse Method Implements the Automata 
Approach for Modal Satisfiability 



Franz Baader and Stephan Tobies 

LuFG Theoretical Computer Science, RWTH Aachen, Germany 
{baader ,tobies}@cs .rwth-aachen.de 



Abstract. This paper ties together two distinct strands in automated 
reasoning: the tableau- and the automata-based approach. It shows that 
the inverse tableau method can be viewed as an implementation of the 
automata approach. This is of interest to automated deduction because 
Voronkov recently showed that the inverse method yields a viable deci- 
sion procedure for the modal logic K. 



1 Introduction 

Decision procedures for (propositional) modal logics and description logics play 
an important role in knowledge representation and verification. When developing 
such procedures, one is both interested in their worst-case complexity and in 
their behavior in practical applications. From the theoretical point of view, it 
is desirable to obtain an algorithm whose worst-case complexity matches the 
complexity of the problem. From the practical point of view it is more important 
to have an algorithm that is easy to implement and amenable to optimizations, 
so that it behaves well on practical instances of the decision problem. The most 
popular approaches for constructing decision procedures for modal logics are i) 
semantic tableaux and related methods [10,2]; ii) translations into classical first- 
order logics [15,1]; and iii) reductions to the emptiness problem for certain (tree) 
automata [17,14]. 

Whereas highly optimized tableaux and translation approaches behave quite 
well in practice [11,12], it is sometimes hard to obtain exact worst-case complex- 
ity results using these approaches. For example, satisfiability in the basic modal 
logic K w.r.t. global axioms is known to be ExpTiME-complete [16]. However, 
the “natural” tableaux algorithm for this problem is a NExpTiME-algorithm [2], 
and it is rather hard to construct a tableaux algorithm that runs in deterministic 
exponential time [6]. In contrast, it is folklore that the automata approach yields 
a very simple proof that satisfiability in K w.r.t. global axioms is in ExpTime. 
However, the algorithm obtained this way is not only worst-case, but also best- 
case exponential: it first constructs an automaton that is always exponential in 
the size of the input formulae (its set of states is the powerset of the set of subfor- 
mulae of the input formulae), and then applies the (polynomial) emptiness test 
to this large automaton. To overcome this problem, one must try to construct 
the automaton “on-the-fly” while performing the emptiness test. Whereas this 



R. Gore, A. Leitsch, and T. Nipkow (Eds.): IJCAR 2001, LNAI 2083, pp. 92-106, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




The Inverse Method Implements the Automata Approach 



93 



idea has successfully been used for automata that perform model checking [9,5], 
to the best of our knowledge it has not yet been applied to satisfiability checking. 

The original motivation of this work was to compare the automata and the 
tableaux approaches, with the ultimate goal of obtaining an approach that com- 
bines the advantages of both, without possessing any of the disadvantages. As 
a starting point, we wanted to see whether the tableaux approach could be 
viewed as an on-the-fiy realization of the emptiness test done by the automata 
approach. At first sight, this idea was persuasive since a run of the automaton 
constructed by the automata approach (which is a so-called looping automaton 
working on infinite trees) looks very much like a run of the tableaux procedure, 
and the tableaux procedure does generate sets of formulae on-the-fiy. However, 
the polynomial emptiness test for looping automata does not try to construct a 
run starting with the root of the tree, as done by the tableaux approach. Instead, 
it computes inactive states, i.e., states that can never occur on a successful run 
of the automaton, and tests whether all initial states are inactive. This com- 
putation starts “from the bottom” by locating obviously inactive states (i.e., 
states without successor states), and then “propagates” inactiveness along the 
transition relation. Thus, the emptiness test works in the opposite direction of 
the tableaux procedure. This observation suggested to consider an approach that 
inverts the tableaux approach: this is just the so-called inverse method. Recently, 
Voronkov [19] has applied this method to obtain a bottom-up decision procedure 
for satisfiability in K, and has optimized and implemented this procedure. 

In this paper we will show that the inverse method for K can indeed be seen 
as an on-the-fiy realization of the emptiness test done by the automata approach 
for K. The benefits of this result are two-fold. First, it shows that Voronkov’s 
implementation, which behaves quite well in practice, is an optimized on-the-fiy 
implementation of the automata-based satisfiability procedure for K. Second, it 
can be used to give a simpler proof of the fact that Voronkov’s optimizations do 
not destroy completeness of the procedure. We will also show how the inverse 
method can be extended to handle global axioms, and that the correspondence 
to the automata approach still holds in this setting. In particular, the inverse 
method yields an ExpTiME-algorithm for satisfiability in K w.r.t. global axioms. 

2 Preliminaries 

First, we briefly introduce the modal logic K and some technical definitions 
related to K-formulae, which are used later on to formulate the inverse calculus 
and the automata approach for K. Then, we define the type of automata used to 
decide satisfiability (w.r.t. global axioms) in K. These so-called looping automata 
[18] are a specialization of Biichi tree automata. 



Modal Formulae. We assume the reader to be familiar with the basic notions 
of modal logic. For a thorough introduction to modal logics, refer to, e.g., [4]. 

K-formulae are built inductively from a countably infinite set V = 
{pi,P 2 , • ■ • } of propositional atoms using the Boolean connectives A, V, and -i 
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and the unary modal operators □ and O. The semantics of K-formulae is define 
as usual, based on Kripke models A4 = {W, R, V) where FT is a non-empty set, 
i? C FF X FF is an accessibility relation, and V :V ^ 2^ is a valuation mapping 
propositional atoms to the set of worlds they hold in. The relation ^ between 
models, worlds, and formulae is defined in the usual way. Let G, H be K-formulae. 
Then G is satisfiable iff there exists a Kripke model M = (FF, R, V) and a world 
w G W with Ai,w \= G. The formula G is satisfiable w.r.t. the global axiom H iff 
there exists a Kripke model M = (FF, i?, V) and a world w G W such M,w \= G 
and M,w' \= H for all w' G FF. K-satisfiability is PSPACE-complete [13], and 
K-satisfiability w.r.t. global axioms is ExpTiME-complete [16]. 

A K-formula is in negation normal form (NNF) if -■ occurs only in front of 
propositional atoms. Every K-formula can be transformed (in linear time) into 
an equivalent formula in NNF using de Morgan’s laws and the duality of the 
modal operators. 

For the automata and calculi considered here, sub-formulae of G play an im- 
portant role and we will often need operations going from a formula to its super- 
or sub-formulae. As observed in [19], this becomes easier when dealing with 
“addresses” of sub-formulae in G rather than with the sub-formulae themselves. 

Definition 1 (G-Paths). For a K-formula G in NNF, the set of G-paths IIg 
is a set of words over the alphabet {V;, V^, Aj, Ar, □, O}. The set IIg and the 
sub-formula G|^ of G addressed by tt G IIg are defined inductively as follows: 

— e G IIg and G\e = G 

— if TT G IIg and 

— G\t^ = F\ /\ F 2 then 7rA/,7rAr G IIg, GIttAi = G|.,rAr = ^ 2 , and tt is 
called A-path 

— G\tt = FiV F 2 then 7rV/,7rVr G IIg, GIttVi = G|.,rVr = ^ 2 , and tt is 
called V-path 

— G\t^ = DF then ttD G IIg, G|,rD = F and tt is called D-path 

— G\t^ = OF then ttO G IIg, G\ttO = F and tt is called O-path 

— IIg is the smallest set that satisfies the previous conditions. 

We use of A* and V* as placeholders for Ai,Ar and VqVr, resp. Also, we use XX 
and 0 as placeholders for A, V and □, O, resp. If tt is an A- or and V-path then 

TT is called A-path. If tt is a □- or a O-path then tt is called 0-path. Fig. 1 shows 

an example of a K-formula G and the corresponding set IIg, which can be read 
off the edge labels. For example, ArAr is a G-path and G|A,.Ar = '^(“'P 2 V pi). 



Looping Automata. For a natural number n, let [n] denote the set {1, . . . , n}. 
An n-ary infinite tree over the alphabet A is a mapping t : [n]* — X A. An n- 
ary looping tree automaton is a tuple 2t = (Q, A, I, A), where Q is a finite set of 
states, A is a finite alphabet, I C Q is the set of initial states, and A C Q x A x Q” 
is the transition relation. Sometimes, we will view Z\ as a function from Q x E 
to 2^ and write A{q, a) for the set {q | {q, a, q) G A}. A run of 21 on a tree t 
is a n-ary infinite tree r over Q such that (r{p),t{p), (r(pl), . . . , r{pn))) G A for 
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Cg : -'P2 I'g ■ Pi 

Fig. 1. The set Ila for G = 0-ipi A (np 2 A □(-'P 2 V pi)) 



every p € [n]*. The automaton 21 accepts t iff there is a run r of 21 on t such that 
r(e) G I. The set T(2l) := {t | 21 accepts t} is the language accepted by 21. 

Since looping tree automata are special Biichi tree automata, emptiness 
of their accepted language can effectively be tested using the well-known 
(quadratic) emptiness test for Biichi automata [17]. However, for looping tree 
automata this algorithm can be specialized into a simpler (linear) one. Though 
this is well-known, there appears to be no reference for the result. 

Intuitively, the algorithm works by computing inactive states. A state q G Q 
is active iff there exists a tree t and a run of 21 on t in which q occurs; otherwise, 
q is inactive. It is easy to see that a looping tree automaton accepts at least 
one tree iff it has an active initial state. How can the set of inactive states be 
computed? Obviously, a state from which no successor states are reachable is 
inactive. Moreover, a state is inactive if every transition possible from that state 
involves an inactive state. Thus, one can start with the set 

Qo := {<? G Q I Vcr G S-A{q, a) = 0} 

of obviously inactive states, and then propagate inactiveness through the tran- 
sition relation. We formalize this propagation process in a way that allows for 
an easy formulation of our main results. 

A derivation of the emptiness test is a sequence Qo l> Qi l> . . . l> Qk such 
that Qi C Q and Qi [> Qi+i iff Qi+i = QiU {g} with 

q € {q' € Q \ \/a € E.\/{qi,.. .,qn)& A{q,a).3j.qj G QJ. 

We write Qo P iff there is a fc G N and a derivation Qo >...[> Qfc with 
P = Qk- The emptiness test answers “L(2l) = 0” iff there exists a set of states 
P such that Qo >* P and I Q P. 

Note that Q \> P implies Q Q P and that Q Q Q' and Q \> P imply 
Q' [>* P. Consequently, the closure Q^ of Qo under [>, defined by Qq =: (J{-P | 
Qo t>* P}, can be calculated starting with Qo, and successively adding states q to 
the current set Qi such that Qi [> QiU {q} and q ^ Qi, until no more states can 
be added. It is easy to see that this closure consists of the set of inactive states. 
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and thus L(2l) = 0 iff / C Q^. As described until now, this algorithm runs in 
time polynomial in the number of states. By using clever data structures and a 
propagation algorithm similar to the one for satisfiability of propositional Horn 
formulae [7], one can obtain a linear emptiness test for looping tree automata. 

3 Automata, Modal Formulae, and the Inverse Calculus 

We first describe how to decide satisfiability in K using the automata approach 
and the inverse method, respectively. Then we show that both approaches are 
closely connected. 

3.1 Automata and Modal Formulae 

Given a K-formula G, we define an automaton 21 g such that L{^q) = 0 iff G 
is not satisfiable. In contrast to the “standard” automata approach, the states 
of our automaton 21 g will be subsets of Ila rather than sets of subformulae 
of G. Using paths instead of subformulae is mostly a matter of notation. We 
also require the states to satisfy additional properties (i.e., we do not allow for 
arbitrary subsets of TTg)- This makes the proof of correctness of the automata 
approach only slightly more complicated, and it allows us to treat some im- 
portant optimizations of the inverse calculus within our framework. The next 
definition introduces these properties. 

Definition 2 (Propositionally expanded, clash). Let G be a K-formula in 
NNF, lie the set of G -paths, and <P C lie- An A-path n £ <1 is propositionally 
expanded in iff {ttA/, 7rAr} C An \/-path tt £ F is propositionally expanded 
in <P iff {ttV/, 7rVr} fl yf 0. The set <P is propositionally expanded iff every 
A-path TT £ <P is propositionally expanded in <T. We use “p.e.” as an abbreviation 
for “propositionally expanded” . 

The set L>' is an expansion of the set <P if<P C <!>' , (F is p.e. and <P' is minimal 
w.r.t. set inclusion with these properties. For a set <P, we define the set of its 
expansions as {{<P)) := {^' | is an expansion of <F}. 

<F contains a clash iff there are two paths such that Gj,,, = p and 

G\tt 2 = ~'P for a propositional variable p. Otherwise, <F is called clash- free. 

For a set of paths F, the set ((F)) can effectively be constructed by successively 
adding paths required by the definition of p.e. A formal construction of the 
closure can be found in the proof of Lemma 4. Note that 0 is p.e., clash-free, 
and ((0)) = {0}. 

Definition 3 (Formula Automaton). For a K-formula G in NNF, we fix an 
arbitrary enumeration {tti, . . . , 7t„} of the <> -paths in IIq. The n-ary looping 
automaton 21g is defined by 21g := {Qg, i ^g) > where Qc '■= FJq := 

{F C 77g I F is p.e.} and the transition relation Aq is defined as follows: 

— Ag contains only tuples of the form {F, F, . . .). 
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— If (P is dash-free, then we define Ac{d^,*I>) := {{'I'l)) x • • • x ((’fVi)), where 



{TTiO} U {ttD I 7t G ^ is a O-path } 

0 



if TTi G for the O' -path iTi 
else 



— If <P contains a clash, then Aa{d>,<P) = 0, i.e., there is no transition from <P. 



Note, that this definition implies Z\g(0: 0) = {(0) • ■ • > 0)} and only states with a 
clash have no successor states. 



Theorem 1. For a K-formula G, G is satisfiable iff L{‘QIq) ^ 0. 

This theorem can be proved by showing that i) every tree accepted by Slg induces 
a model of G] and ii) every model Ad of G can be turned into a tree accepted 
by 21 g by a) unraveling M. into a tree model T for G; b) labeling every world 
of T with a suitable p.e. set depending on the formulae that hold in this world; 
and c) padding “holes” in T with 0. 

Together with the emptiness test for looping tree automata, Theorem 1 yields 
a decision procedure for K-satisfiability. To test a K-formula G for unsatisfiability, 
construct 2tc and test whether L{%q) = 0 holds using the emptiness test for 
looping tree automata: T(21 g) = 0 iff (({e})) C where Qo C Qa is the set 
of states containing a clash. The following is a derivation of a superset of (({e})) 
from Qo for the example formula from Fig. 1: 

Qo = [> Qod {{vo,Vl,V 2 ,V 7 „Vi}} 

■v" ^ 'V' ^ 

= = «{ 4 )) 



3.2 The Inverse Calculus 

In the following, we introduce the inverse calculus for K. We stay close to the 
notation and terminology used in [19]. 

A sequent is a subset of Ila. Sequents will be denoted by capital greek letters. 
The union of two sequents F and A is denote by F, A. If T is a sequent and 
7T G IIq then we denote F U {tt} by F, tt. If F is a sequent that contains only 
□-paths then we write FD to denote the sequent {ttD | tt G F}. Since states of 
are also subsets of IIg and hence sequents, we will later on use the same 
notational conventions for states as for sequents. 

Defiuitiou 4 (The inverse path calculus). Let G be a formula in NNF and 

IIg the set of paths ofG. Axioms of the inverse calculus are all sequents {7Ti,7r2} 
such that Gj^j = p and G\tt 2 = ~'P for some propositional variable p. The rules 
of the inverse calculus are given in Fig. 2, where all paths occurring in a sequent 
are G-paths and, for every 0+ inference, tt is a O-path. We refer to this calculus 
by ICg-^ 

^ G appears in the subscript because the calculus is highly dependent of the input 
formula G: only G-paths can be generated by ICq. 
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(O) 
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F,ttAi 
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Fa 
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(Ar) 



F, nAr 

TOT 



Fig. 2. Inference rules of ICg 



We define So := {F 
sets of sequents 5o b • • 



r is an axiom }. A derivation of ICq is a sequence of 
h Sm where Si h iff Si+i = SiU {T} such that 

r r 

there exist sequents Fi, . . . Fk € Si and -jf is an inference. 



We write So l~* S iff there is a derivation 5o h • • • h Sm with S = Sm- The closure 
Sg of So under h is defined by 5 q = |J{5 | So b* S}. Again, the closure can 
effectively be computed by starting with So and then adding sequents that can 
be obtained by an inference until no more new sequents can be added. 

As shown in [19], the computation of the closure yields a decision procedure 
for K-satisfiability: 



Fact 1. G is unsatisfiable iff {e} G . 



Fig. 3 shows the inferences of ICg that lead to vq = e for the example formula 
from Fig. 1. 



3.3 Connecting the Two Approaches 

The results shown in this subsection imply that ICg can be viewed as an on- 
the-fiy implementation of the emptiness test for 21 g- In addition to generating 
states on-the-fiy, states are also represented in a compact manner: one sequent 
generated by ICg represents several states of 2Ig- 

Definition 5. For the formula automaton 21g with states Qg and a sequent 
F C 77g we define []F]| := {<? G Qg \ F C <P}, and for a set S of sequents we 
define |5]j := UresI-^II- 

The following theorem, which is one of the main contributions of this paper, 
establishes the correspondence between the emptiness test and ICg- Its proof 
will be sketched in the remainder of this section (see [3] for details). 

Theorem 2 (ICg and the emptiness test mutually simulate each 
other). Let Qo, So, >, and h he defined as above. 

F Let Q he a set of states such that Qo>* Q. Then there exists a set of sequents 
S with So F* S and Q C [[iS]| . 

2. Let S be a set of sequents such that 5oF*5. Then there exists a set of states 
Q F Qg with Qo >* Q and |5]| C Q. 
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(V)- 
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ArAiD, ArArOVi 



(O) 
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A/, ArA^, ArAr 
A/, Ar, ArA/ 



(Ar 



A/ , Ar 



(A* 



e, Ai 



Fig. 3. An example of inferences in ICg 



The first part of the theorem shows that ICg can simulate each computation of 
the emptiness test for 21 g- The set of states represented by the set of sequents 
computed by ICg may be larger than the one computed by a particular derivation 
of the emptiness test. However, the second part of the theorem implies that all 
these states are in fact inactive since a possibly larger set of states can also 
be computed by a derivation of the emptiness test. In particular, the theorem 
implies that ICg can be used to calculate a compact representation of Qq . This 
is an on-the-fiy computation since 21 g is never constructed explicitly. 

Corollary 1. Qg” = [[5|;']]. 

The proof of the second part of Theorem 2 is the easier one. It is a consequence 
of the next three lemmata. First, observe that the two calculi have the same 
starting points. 

Lemma 1. If Sq is the set of axioms of ICg, and Qq is the set of states oft^c 
that have no successor states, then [[5o]] = Qo- 

Second, since states are assumed to be p.e., propositional inferences of ICg 
do not change the set of states represented by the sequents. 

Lemma 2. Let S \~ T be a derivation of ICg that employs a A/-, A^-, or a 
V -inference. Then [[5]] = \T\. 

Third, modal inferences of ICg can be simulated by derivations of the empti- 
ness test. 

Lemma 3. Let S \~ T he derivation of ICg that employs a O- or -inference. 
If Q is a set of states with [[5]] U Qo C Q then there exists a set of states P with 
Qt>* P and [[71 C P. 

Given these lemmata, proving Theorem 2.2 is quite simple. 

Proof of Theorem 2.2. The proof is by induction on the length m of the derivation 
(So b 5i • • • h Sm = 5 of ICg- The base case m = 0 is Lemma 1. For the induction 
step, iSi+i is either inferred from Si using a propositional inference, which is dealt 
with by Lemma 2, or by a modal inference, which is dealt with by Lemma 3. 
Lemma 3 is applicable since, for every set of states Q with Qo [>* Q, Qo C Q. □ 

Proving the first part of Theorem 2 is more involed because of the calculation 
of the propositional expansions implicit in the definition of 21 g- 
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Lemma 4. Let <P C IIq he a set of paths and S a set of sequents such that 
((^)) ^ I>5]]. Then there exists a set of sequents T with S \~* T such that there 
exists a sequent A G T with A C<1>. 

Proof. If is p.e., then this is immediate, as in this case ((^)) = {^} C [[5]. 

If <l> is not p.e., then let select be an arbitrary selection function, i.e., a 
function that maps every set T that is not p.e. to a XX-path tt G T that is not 
p.e. in T. Let be the following, inductively defined tree: 

— The root of T,|> is <P. 

— If a node <P' of T,i> is not p.e., then 

— if select(<f") = tt is an A-path, then T has the successor node 'T,TrAi,7rAr 
and W is called an A-node. 

— if select(if') = tt is an V-path, then T has the successor nodes T, ttV; and 
T, ttV; and T is called an V-node. 

— If a node S' of T,i> is p.e., then it is a leaf of the tree. 

Obviously, the construction is such that the set of leaves of is ((<?)) . 

Let Ti,...T( be a post-order traversal of this tree, so the sons of a node 
occur before the node itself and T( = <P. Along this traversal we will construct a 
derivation 5 = To b* • • • h* = T such that, for every 1 < i < j < £,Tj contains 
a sequent Ai with Ai C Ti. Since the sets Tj grow monotonically, it suffices to 
show that, for every 1 < t < ^, Ti contains a sequent Ai with Ai CTi. 

Whenever Ti is a leaf of T,f, then Ti G ((<?)) C [[5]. Hence there is already a 
sequent Ai G To with Ai C Ti and no derivation step is necessary. Particularly, 
in a post-order traversal, Ti is a leaf. 

We now assume that the derivation has been constructed up to Ti. We restrict 
our attention to the case where Ti _|_i is an V-node since the case where Ti _|_i is 
an A-node can be treated similarly and the case where Ti+i is a leaf as above. 

Thus, assume that Ti+i is an V-node with selected V-path tt G Ti+i. Then, 
the successors of Ti+i in T,g are Ti_|_i,7rV/ and Ti+i, ttV^., and by construction 
there exist sequences Ai,Ar G% with A* C Ti+i, ttV*. If ttV; ^ Ai or ttV^ ^ Ar, 
then Ai C Ti+i or A^ C Ti+i holds and hence already Ti contains a sequent A 
with A C Ti _|_i. 

If Ai = Pi, Try I and Ar = TT,7rVr with ttV* ^ T* then ICg can use the 
inference 

Pi,1T\/ I Pr,'K\/r 

^ ’ Ti,r„7T 

to derive Ti b Ti U {Pi, Pr, tt} = Ti+i, and Pi, Pr, tt C Ti+i easily follows. □ 

Proof of Theorem 2.1. We show this by induction on the number k of steps in 
the derivation Qo > ■ • ■ > Qk = Q- Again, Lemma 1 yields the base case. 

For the induction step, let Qo > ■ • ■ > Qi > Qi+i = Qi U {d>} be a derivation 
of the emptiness test and Si a set of sequents such that Sq b* Si and Qi C [[5i]] . 
If already G Qi then Qi+i C [[5i] and we are done. 

If ^ Qi, then Qo C Qi implies that Ag (<?,<?) yf 0. Since 0 is an active 
state, we know that 0 ^ Qi, and for Qi [> Qi+i to be a possible derivation of 
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the emptiness test, = {{'Pi)) x ••• x (('Z'„)) {(0,...,0)} must hold, 

i.e., there must be a Ifj ^ 0 such that {{'Pj)) Q Qi Q |5i]]. Hence nj G <P and 
<Pj = {tTjO} U {ttD I 7t € is a D-path}. 

Lemma 4 yields the existence of a set of sequents 7) with Si h* Ti containing 
a sequent A with A C This sequent is either of the form A = TD,7TjO or 
A = ro for some T C In the former case, ICg can use a O-inference and in 

the latter case a 0+-inference to derive S'o b* 5^ h* 71 b 7i U {T, tt^} = S and 
^ Q [[d^, TTj]] holds. □ 

4 Optimizations 

Since the inverse calculus can be seen as an on-the-fly implementation of the 
emptiness test, optimizations of the inverse calculus also yield optimizations of 
the emptiness test. We use the connection between the two approaches to provide 
an easier proof of the fact that the optimizations of ICg introduced by Voronkov 
[19] do not destroy completeness of the calculus. 

4.1 Unreachable States / Redundant Sequents 

States that cannot occur on any run starting with an initial state have no effect 
on the language accepted by the automaton. We call such states unreachable. In 
the following, we will determine certain types of unreachable states. 

Definition 6. Let 7r,7ri,7r2 G IIg- 

— The modal length of tt is the number of occurrences of O and O in tt. 

— TTi, 7T2 G IIg form a V-fork z/tti = and tt 2 = for some tt, tt^, 

— TTi, 7T2 are O-separated z/tti = tt[Ott'{ and tt 2 = such that zr^ have 

the same modal length and tt{ yf zr^. 

Lemma 5. Let 21g be the formula automaton for a K-formula G in NNF and 
<P G Q. If <P contains a V-fork, two <> -separated paths, or two paths of different 
modal length, then <P is unreachable. 

The lemma shows that we can remove such states from 21 g without changing 
the accepted language. Sequents containing a V-fork, two O-separated paths, or 
two paths of different modal length represent only unreachable states, and are 
thus redunant, i.e., inferences involving such sequents need not be considered. 

Definition 7 (Reduced automaton). Let Q be the set of states o/ 2 Ig that 
contain a V-fork, two <> -separated paths, or two paths of different modal length. 
The reduced automaton 21q = {Q'g, Hg, (({e}))? ^g) defined by 

Qg ■“ Qg \ Q and Ag ■= Ag H {Qg x Vg x Qg x • • • x Qg)- 

Since the states in Q are unreachable, L(21g) = L(21q). From now on, we consider 
TI'g and define []-]| relative to the states on 21^: |T]| = {<P G Q'g \ r C <P}. 
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4.2 G-Orderings / Redundant Inferences 

In the following, the applicability of the propositional inferences of the inverse 
calculus will be restricted to those where the affected paths are maximal w.r.t. 
a total ordering of Ug- In order to maintain completeness, one cannot consider 
arbitrary orderings in this context. 

Two paths 7Ti,7r3 are brothers iff there exists a XX-path tt such that tti = ttXX/ 
and 7T3 = TrXXr or tti = ttXX^ and 7T3 = ttXXj. 

Definition 8 (G-ordering). Let G be a K-formula in NNF. A total ordering 
>- of IIg is called a G-ordering ijf 

1. 7Ti 7T2 whenever 

a) the modal length of tti is strictly greater than the modal length of tt 2; or 

b) 7ri,7r2 have the same modal length, the last symbol of tti is XX*, and the 
last symbol of 7T2 is 0; or 

c) 7ri,7r2 have the same modal length and 7T2 is a prefix of tt\ 

2. There is no path between brothers, i.e., there exist no G-paths 7ri,7r2,7T3 such 

that 7Ti 7T2 7T3 and tti, 713 are brothers. 



For the example formula G of Fig. 1, a G-ordering can be defined by setting 
vq >- v% >■■■>- vi >- vq. Voronkov [19] shows that G-orderings exist for every K- 
formula G in NNF. Using an arbitrary, but fixed G-ordering )^, the applicability 
of the propositional inferences is restricted as follows. 



Definition 9 (Optimized Inverse Calculus). For a sequent F and a path tt 
we write tt F iff t : t :' for every tt ' G F. 

- An inference (A«) — respects (jff ttA* F. 



An inference (v)- 



F,tt 

T/,7TV/ 



F7. , TT \/ 



Tl,Fr,TT 



respects tjff ttV; Fi and irVr F G- 



- The C’- and O'^-inferences always respect 

The optimized inverse calculus ICq works as ICq, but for each derivation Sq F 
■ ■ ■ \- Sk the following restrictions must hold: 



— For every step Si h 5i+i, the employed inference respects and 

— Si must not contain V -forks, O -separated paths, or paths of different modal 
length. 



To distinguish derivations of ICg and ICq, we will use the symbol 1> in deriva- 
tions of ICq. In [19], correctness of ICq is shown. 

Fact 2 ([19]). Let G be a K-formula in NNF and >- a G-ordering. Then G is 
unsatisfiable iff {e} G S^ . 



Using the correspondence between the inverse method and the emptiness test of 
21q, we will now give an alternative, and in our opinion simpler, proof of this 
fact. Since ICq is merely a restriction of ICg, soundness (i.e., the if-direction of 
the fact) is immediate. 




The Inverse Method Implements the Automata Approach 103 



Completeness requires more work. In particular, the proof of Lemma 4 needs 
to be reconsidered since the propositional inferences are now restricted: we must 
show that the XX-inferences employed in that proof respect (or can be made to 
respect) To this purpose, we will follow [19] and introduce the notion of 
compactness. For )^-compact sets, we can be sure that all applicable XX-inferences 
respect To ensure that all the sets Ti constructed in the proof of Lemma 4 are 
> — compact, we again follow Voronkov and employ a special selection strategy. 

Definition 10 ()^-conipact, select^). Let G be a K-formula in NNF and >■ a 
G-ordering. An arbitrary set <P C lie is )^-compact iff, for every Yf^-path tt G <P 
that is not p.e. in <P, ttXX* <P. 

The selection function select^ is defined as follows: if <P is not p.e., then let 
{tti, . . . ,7Tm} be the set ofYY-paths that are not p.e. in T>. From this set, select^ 
selects the path such that the paths TTiYYt. are the two smallest elements in 
{tTjXX* I 1 < j < m}. 

The function select^ is well-defined because of Condition (2) of G-orderings. The 
definition of compact ensures that XX-inferences applicable to not propositionally 
expanded sequents respect 

Lemma 6. Let G be a K-formula in NNF, >- a G-ordering, and select^ the 
selection function as defined above. Let <P = {e} or = Tn,7riO with O-paths 
F and a O-path tt, all of equal modal length. IfT,p, as defined in the proof of 
Lemma 4, is generated using select^ as selection function, then every node F of 
is '^-compact. 

The proof of this lemma can be found in [3]. It is similar to the proof of 
Lemma 5.8.3 in [19]. Given this lemma, it is easy to show that the construc- 
tion employed in the proof of Lemma 4 also works for IC^, provided that we 
restrict the set F as in Lemma 6: 

Lemma 7. Let F = {e} or F = FO,TTiO with O-paths F and a O-path tt all 
of equal modal length and S a set of sequents such that {{F)) C |5]]. Then there 
exists a set of sequents T with 5 hi T such that there exists A G T with A C F. 

Alternative Proof of Fact 2. As mentioned before, soundness (the if-direction) 
is immediate. For the only-if-direction, if G is not satisfiable, then ^(SIq) = 0 
and there is a set of states Q with Qo >* Q and (({e})) C Q. Using Lemma 7 we 
show that there is a derivation of IC^ that simulates this derivation, i.e., there 
is a set of sequents S with 5o h^ S and Q C [[5]. 

The proof is by induction on the length m of the derivation Qq [>...[> 
Qm = Q and is totally analogous to the proof of Theorem 2. The base case is 
Lemma 1, which also holds for ICq and the reduced automaton. The induction 
step uses Lemma 7 instead of Lemma 4, but this is the only difference. 

Hence, Qo >* Q and (({e})) C Q implies that there exist a derivation So hi S 
such that (({e})) C ][5]]. Lemma 7 yields a derivation ShfT with {e} gT Q Sif . 

□ 
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5 Global Axioms 

When considering satisfiability of G w.r.t. the global axiom H, we must take 
subformulae of G and H into account. We address subformulae using paths in 
G and H. 

Definition 11 ((G, il)-Paths). For K-formulae G,FI in NNF, the set of 
(G, H)-paths F[g,h is a subset of {ec, e//}-{V/, Vr, A;, A^., □, O}*. The set IIg,h 
and the subformula (G,il) |,r of G,Fl addressed by a path tt € IIg,h are defined 
inductively as follows: 

— G nc^H and (G,il)|e^ = G, and ch G FIg^h and {G,F[)\^^ = iJ 

— z/tt G IIg,h and (G, il)|^ = F 1 AF 2 then ttA;, zrAr G IIg,h, (G, HIIttAi = Fi, 

(G, = F 2 , and tt is called A-path. 

— The other cases are defined analogously (see also Definition 1). 

^ FIg,h is the smallest set that satisfies the previous conditions. 

The definitions of p.e. and clash are extended to subsets of IIg,h in the 
obvious way, with the additional requirement that, for ^ yf 0 to be p.e., ch & 
must hold. This additional requirement enforces the global axiom. 

Definition 12 (Formula Automaton w. Global Axiom). For K-formulae 
G, F[ in NNF, let {tti, . . . , 7t„} be an enumeration of the O-paths in T[g,h. The n- 
ary looping automaton ‘^g,h is defined by Stq := (Qg,h, Fg,h, {{{^g})), ^G,h), 
where Qg,h ■= TJg,h '■= {T’ G T[g,h I T> is p.e.} and the transition relation 
Ag,h is defined as for the automaton 21g in Definition 3. 



Theorem 3. G is satisfiable w.r.t. the global axiom F[ ijf L{iAG ,h) 7 ^ 0- 



Definition 13 (The Inverse Calculus w. Global Axiom). Let G,H be K- 

formula in NNF and T[g,h the set of paths of G,H. Sequents are subsets of 
JTg.h, and operations on sequents are defined as before. 

In addition to the inferences from Fig. 2, the inverse calculus for G w.r.t. 
the global axiom H, employs the inference 



From now on, [[•]] is defined w.r.t. the states of ^g,h, i-e., [[TJ := {d> G Qg,h \ 
F C$}. 



Theorem 4 and the emptiness test for // simulate each 

other). Let \~ax denote derivation steps of , and [> derivation steps of the 

emptiness test for ‘^g.h- 

1. Let Q C Qg,h be a set of states such that Qo [>* Q. Then there exists a set 
of sequents S with Sq and Q C [[5]]. 
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2. Let S he a set of sequents such that iSoKia;'?. Then there exists a set of states 
Q Q Qg with Qo [>* Q and |5]] C Q. 

Lemma 1, 2, and 3, restated for 21 g // and can be shown as before. 

The following lemma deals with the ax-inference of 

Lemma 8. Let S \> T he a derivation of that employs an ax-inference. 

TheniSl = m- 

The proof of Theorem 4.2 is now analogous to the proof of Theorem 2.2. For 
the proof of Theorem 4.1, Lemma 4 needs to be re-proved because the change in 
the definition of p.e. now also implies that ch & ’T holds for every set (L G {{T)) 
for any yf 0. This is where the new inference ax comes into play. In all other 
respects, the proof of Theorem 4.1 is analogous to the proof of Theorem 2.1. 

Corollary 2. yields an ExpTime decision procedure for satis fiahility 

w.r.t. global axioms in K. 

The following algorithm yields the desired procedure: 

Algorithm 1. Let G,H he K-formulae in NNF. To test satisfiahility of G w.r.t. 
H, calculate S^. // {0, {ec}} H 5^ yf 0, then answer “not satisfiahle,” and 
“satisfiahle” otherwise. 

Correctness of this algorithm follows from Theorem 3 and 4. If G is not satisfiahle 
w.r.t. H, then L{QIc^h) = 0, and there exists a set of states Q with Qo t>* Q 
and (({ec})) C Q. Thus, there exists a set of sequents S with 5o Fas 5 such that 
Q Q I>5]]. With (the appropriately reformulated) Lemma 4 there exists a set of 
sequents T with S \~axT such that there is a sequent A G T with A C {ea}. 
Consequently, A = % or A = {ea}- 

Conversely, since Sq \~ax there exists a set of (inactive) states Q such 
that Qo >* Q and [[5|f]] C Q. Since (({ec})) C llec}]] C |0], we know that 
{0, {eq}} n iSg* yf 0 implies (({ec})) C Q. Consequently, L{^g.h) = 0 and thus 
G is not satisfiahle w.r.t. H. 

For the complexity, note that there are only exponentially many sequents. 
Consequently, it is easy to see that the saturation process that leads to can 
be realized in time exponential in the size of the input formulae. 

6 Future Work 

There are several interesting directions in which to continue this work. First, sat- 
isfiability in K (without global axioms) is PS PACE-complete whereas the inverse 
method yields only an ExpTiME-algorithm. Can suitable optimizations turn this 
into a PSPACE-procedure? Second, can the optimizations considered in Section 4 
be extended to the inverse calculus with global axioms? Third, Voronkov consid- 
ers additional optimizations. Can they also be handled within our framework? 
Finally, can the correspondence between the automata approach and the inverse 
method be used to obtain inverse calculi and correctness proofs for other modal 
or description logics? 




106 



F. Baader and S. Tobies 



References 

1. C. Areces, R. Gennari, J. Heguiabehere, and M. de Rijke. Tree-based heuristics in 
modal theorem proving. In W. Horn, editor, Proc. of ECAI2000, Berlin, Germany, 
2000. lOS Press Amsterdam. 

2. F. Baader and U. Sattler. An overview of tableau algorithms for description logics. 
Studia Logica, 2001. To appear. 

3. F. Baader and S. Tobies. The inverse method implements the automata approach 
for modal satisfiability. LTGS-Report 01-03, LuFG Theoretical Gomputer Science, 
RWTH Aachen, Germany, 2001. 

See http: //www-lti . informatik.rwth-aachen.de/Forschung/Reports .html. 

4. P. Blackburn, M. de Rijke, and Y. Venema. Modal Logic. Cambridge University 
Press, 2001. Publishing date May 2001, preliminary version available online from 
http : / /www . mlbook . org/ . 

5. C. Courcoubetis, M. Y. Vardi, P. Wolper, and M. Yannakakis. Memory efficient 
algorithms for the verification of temporal properties. In E. M. Clarke and R. P. 
Kurshan, editors, Proc. of Computer-Aided Verification (CAV ’90), volume 531 of 
LNCS, pages 233-242. Springer Verlag, 1991. 

6. F. M. Donini and F. Massacci. EXPTIME tableaux for ALC. Artificial Intelligence, 
124(1):87-138, 2000. 

7. W. F. Dowling and J. H. Gallier. Linear-time algorithms for testing the satisfiability 
of propositional horn formulae. Journal of Logic Programming, l(3):267-28, 1984. 

8. R. Dyckhoff, editor. Proc. of TABLEAUX 2000, number 1847 in LNAI, St An- 
drews, Scotland, UK, 2000. Springer Verlag. 

9. R. Gerth, D. Peled, M. Y. Vardi, and P. Wolper. Simple on-the-fly automatic 
verification of linear temporal logic. In Proc. of the 15th International Symposium 
on Protocol Specification, Testing, and Verifieation, pages 3-18, Warsaw, Poland, 
1995. Chapman & Hall. 

10. R. Gore. Tableau methods for modal and temporal logics. In M. D’Agostino, 
D. M. Gabbay, R. Hahnle, and J. Posegga, editors. Handbook of Tableau Methods. 
Kluwer, Dordrecht, 1998. 

11. I. Horrocks. Benchmark analysis with FaCT. In Dyckhoff [8], pages 62-66. 

12. U. Hustadt and R. A. Schmidt. MSPASS: Modal reasoning by translation and 
first-order resolution. In Dyckhoff [8], pages 67-71. 

13. R. E. Ladner. The computational complexity of provability in systems of modal 
propositional logic. SIAM Journal on Computing, 6(3):467-480, 1977. 

14. C. Lutz and U. Sattler. The complexity of reasoning with boolean modal logic. In 
Wolter F., H. Wansing, M. de Rijke, and M. Zakharyaschev, editors. Preliminary 
Proc. of AiML2000, Leipzig, Germany, 2000. 

15. R. A. Schmidt. Resolution is a decision procedure for many propositional modal 
logics. In M. Kracht, M. de Rijke, H. Wansing, and M. Zakharyaschev, editors. 
Advances in Modal Logic, Volume 1, volume 87 of Leeture Notes, pages 189-208. 
CSLI Publications, Stanford, 1998. 

16. E. Spaan. Complexity of Modal Logics. PhD thesis, Univ. van Amsterdam, 1993. 

17. M. Y. Vardi and P. Wolper. Automata-theoretic techniques for modal logics of 
programs. Journal of Computer and System Seiences, 32:183-221, 1986. 

18. M. Y. Vardi and P. Wolper. Reasoning about infinite computations. Information 
and Computation, 115:1-37, 1994. 

19. A. Voronkov. How to optimize proof-search in modal logics: new methods of prov- 
ing redundancy criteria for sequent calculi. ACM Transactions on Computational 
Logic, l(4):35pp, 2001. 




Deduction-Based Decision Procedure for a 
Clausal Miniscoped Fragment of FTL 



Regimantas Pliuskevicius 

Institute of Mathematics and Informatics, 
Akademijos 4, Vilnius 2600, LITHUANIA, 
regisOktl .mii.lt 



Abstract. A simple decision deductive-based procedure for the so- 
called clausal miniscoped fragment of a first-order linear temporal logic 
with temporal operators Next and Always is presented. The soundness 
and completeness of the proposed decision procedure is proved. 



1 Introduction 

A temporal logic has been found valuable for specifications of various computer 
and multi-agent systems. To use such specifications, however, it is necessary to 
have techniques for reasoning on temporal logic formulas. Model-checking meth- 
ods are effective and automatic for temporal formulas that are propositional. 
For more complex systems, however, it is necessary or convenient to employ a 
first-order temporal logic {FTL, in short). FTL is a very expressive language. 
Unfortunately, FTL is incomplete, in general [11]. But it becomes complete [5, 
12] after adding an w-type rule. 

In some particular cases, the FTL (and, of course, in the propositional case) 
is finitary complete and/or decidable. Recently in [4] the decidability of a so- 
called monodic fragment of FTL has been proved. In this fragment all formulas 
of FTL (without function symbols!) beginning with a temporal operator 0 ,n 
(Next or Always)have at most one free variable (monodic condition). 

In this paper, we consider a so-called miniscoped fragment of FTL. A formula 
A of FTL is in miniscoped form if all negative (positive) occurrences of V (3, 
correspondingly) in A occur only in the formula of the shape QxE{x) (where 
X = x\, . . . , Xn) and E{x) is an elementary formula. The objects of consideration 
of the proposed decision procedure CM Sat are so-called CM-sequents with an 
n-place (n > 1) predicate and function symbols. In this sense the presented 
fragment of ETL is non-monodic. Other decidable non-monodic fragments of 
FTL are considered in [8, 9, 10]. CM-sequents are a “miniscoped” version of 
Fisher’s normal form [2]. Since we consider the miniscoped fragment of FTL 
based on clauses from Fisher’s normal form, the considered here subclass of 
FTL we call a clausal miniscoped fragment of FTL. 

The proposed decision procedure CM Sat is based on the saturation method 
[6-10]. In the saturation method the notions of calculus and deduction-based 
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decision/semi-decision procedure are identical. A derivation in the proposed pro- 
cedure CM Sat is constructed in a finite tree form. The tree is constructed au- 
tomatically and satisfies the so-called loop property (see Lemma 11), which is 
the main characteristic peculiarity of the saturation method. Namely, each leaf 
Si of the tree is either a traditional logical axiom or there exists a vertex S* of 
the tree such that Si and S* satisfy some similarity relation. We can see a sim- 
ilar situation in Fisher’s resolution method (see, e.g., [1, 3]) for a propositional 
temporal logic. 

2 Description of Infinitary Sequent Calculi Gl^ and 

The proposed decision procedure CM Sat is justified by infinitary calculi 
Gluj, containing the w-type rule. 

Definition 1 (term, elementary formula, formula). We assume that all 
predicate symbols are flexible (i.e., change their value in time), and all constants 
and function symbols are rigid (i.e., with time-independent meanings). A term 
is defined as usual. An elementary formula is either the truth constant T, or an 
expression of the form P{t\, . . . ,tm) where P is a predicate symbol, U (1 < i < n) 
is a term. Formulas are defined as usual. 

In the first order linear temporal logic over infinite sequences we have that 
0 (A Q B) = oA 0 oB (0 G {d, A,V}) and OtrA = aoA {a G {-•, □,Vx, 3x}). 
Relying on these equivalences we can consider occurrences of the ” next” operator 
O only entering the formula O^E (fc-time ’’next” elementary formula E). For 
the sake of simplicity, we ’’eliminate” the ’’next” operator and the formula O^E 
is abbreviated as E^ (i.e., as an elementary formula with the index k). We also 
use the notation for an arbitrary formula A in the following meaning. 

Definition 2 (index, atomic formula). 1) If E is an elementary formula, 
i,k € oj, fc yf 0, then (A*)^ := := E); E^{1 > 0) is called an atomic 

formula, and E^ becomes elementary if I = 0; 2) (A 0 B)^ := A* 0 B^ if 
0 G {D,V}; (ctA)^ := aA^ , if (7 G {□,Vx,3x}. In case we want to indicate the 
dependence of an atomic formula on the terms t\, . . . ,tn we write E^(t\, . . . t„) 
instead of E^ . For example the expression \/x{P^{x) D Q^(a:))^ means the for- 
mulayx{ooP{x) D OOOOQ(x)), or'dx{P‘^{x) D Q'^{x)). 

Definition 3 (sequent, miniscoped sequent, quasi-atomic formula and 
quasi-elementary formula). A sequent is an expression of the form P ^ A, 
where we assume that P,A are arbitrary finite multisets (i.e., not sequences or 
sets) of formulas. A sequent S is a miniscoped sequent if all negative (posi- 
tive) occurrences of\/ (3, correspondingly) in S occur only in formulas of the 
shape QxE(x) (where x = x\, . . . ,Xn, E{x) (n > 0) is an atomic (elementary) 
formula. This formula is called a quasi-atomic (quasi-elementary, respectively) 
formula; if Qx = 0, then a quasi-atomic formula becomes an atomic one; if 
Qx = 0 and E is an elementary formula, then a quasi-atomic formula becomes 
an elementary one. 
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Now we shall consider some special form of a miniscoped version of Fisher’s 
normal form [2]. First we define so-called kernel formulas. 

Definition 4 (regular and non-regular kernel formula, indexed reg- 
ular and non-regular kernel formula). A formula A is a regu- 
lar kernel formula, if A = □( A 'ix<TiEi{x) D V 3zajPHz)), where 

(Ji,Uj € {0,-i}, Ei{x), Pj{z) are some quasi- elementary formulas, x = 
xi,...,Xk', z = zi,...,Zk {k > 0); A formula B is a non-regular ker- 
nel formula, if B = □( A \/xaiEi{x) D -’DaP), where at, a G {0, -i}; 

i=l 

Ei{x),P are some quasi- elementary formulas; x = Xi,...,Xk {k > 0). Let A 
he a regular (non-regular) kernel formula. Then A^ is an indexed regular (non- 
regular, respectively) kernel formula. 



Definition 5 (CM-sequent, indexed CM-sequent, parametrical formu- 
las of CM-sequent, induction-free CM-sequent and indexed CM- 
sequent). A miniscoped sequent S is a CM-sequent, if S = Ei, Df? 

^ 2 , 0 ^ A, where Si = 0 (i € {1,2}) or consist of quasi- elementary for- 
mulas (which are called parametrical formulas of CM-sequents); nQ consists of 
regular /non-regular kernel formulas; G {0,n}; cP A = 0 or consists of for- 

mulas of the shape OaE, where a G { 0 ,-i|, E is a quasi- elementary formula; 
A = 0 or consists of formulas of the shape <tE, where a = |0,-i}, E is a 
quasi- elementary formula. A miniscoped sequent S is an indexed CM-sequent, 
if S = Sii, SI 2 , f ^21,^22: A^ , where Sn,Si2 = 0 (i G { 1 , 2 }) or 

consist of quasi- elementary formulas; af2, cP A mean the same as in the case of 
CM-sequents. // = 0 and S does not contain non-regular kernels, then the 

CM-sequent S (indexed CM-sequent) is an induction-free CM-sequent (indexed 
CM-sequent, respectively). 



Definition 6 (calculus Glui)- The calculus Gluj is defined by the following 
postulates. 

Axioms: P,A^A, A; P A, T. 

Rules: 

1 ) temporal rules 



A, aA'^, P ^ A 
□A, P ^ A 



(□^) 



P^ A, A;...; P ^ A,A^,. 
P ^ A, aA 



-(^ Oo;), 



where k G w; here and below A^ means if A = A^ (^ > !)• 

2) logical rules consist of the traditional invertible rules for D, A, V, - 1 , V, 3. 



Theorem 1. (a) (soundness and completeness ofC^;). Let S he a sequent, then 
the sequent is universally valid iff Glu h S 

(b) (admissibility of (cut)). Gluj + (cut) h S' Gluj 1“ S. 
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Proof. Point (a) is proved by the Shutte method. Point (b) follows from point 
(a). 

Definition 7 (calculi G*). A calculus G*j^^ is obtained from the 

calculus Gllo by following transformations: (1) by dropping the rules 
(— >-D), (a — >■), (— >■ V), (V — >■), (— >■ 3) and (2) replacing the axiom P, A ^ A, A 
by the following axioms: 

r, aA -)> A, aA; 

r, E^{ti,..., tn) ^ A, 3xi... XnE^{xi , . . . , x„); 

P, Vxi . . . XnE’"{xi, ...,Xn)^ A, E^fti , . . . , t„); 
P,'ixi . . ,XnE^{ti{xi), . . . ^ A,3yi . . . . . . ,p„(y„)), 

where k > 0, E is a predicate symbol; Vi (1 < i < n) terms ti^xf) and Pi{yi) are 
unifiable. 

A calculus G* is obtained from G\^ by dropping the rule (— >■ 

Theorem 2. Let S be a CM-sequent. Then Gllu b S G’f^ h S. 

Proof. Follows from the definition of the CM-sequent using traditional proof- 
theoretical transformations. 



3 Some Auxiliary Tools of Decidable Procedure CM Sat 



First we present the following notions. 

Definition 8 (reduction of a sequent S to sequents Si, . . . , Sn)- Let {i} 

denote a set of rules of a calculus. Then the {i} -reduction (or briefly, reduction) 
of S to a set of sequents Si,...,S„ (denoted by i?(S'){i} {51, . . . , S'„} or 

briefly by R{S), is defined to be a tree of sequents with the root S and leaves 
Si, . . . , Sn, and, possibly, axioms of the calculus G’f^, such that each sequent in 
R{S), different from S, is the ’’upper sequent” of the rule from |i} whose ’’lower 
sequent” also belongs to R{S). 

Now we define rules by which the reduction of a CM-sequent 5 to a set of 
indexed CM-sequents is carried out. 



Definition 9 (reduction rules, parametrically identical formulas). The 

following rules will be called reduction ones (all these rules will be applied in the 
bottom-up manner): 

1) all logical rules of the calculus G’f^, i.e., the rules (d— >-),(V — >■), 

“’ll (“■ (3 -J>), V); 

2) the following temporal rules: 



A, T ^ a 
aA, r ^ A 



(□ 



P^ A, A- A, A^ 
r ^ A, aA 






where A ^ A^; 
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3) the following contraction rules: 



E, A 
E, E*, r ^ A 



{Ce^) 



A,E 

r A,E,E* 



(“f C'-e)) 



where E, E* are quasi-atomic formulas such that either E = E* or E, E* are 
congruent, or E, E* differ only by corresponding occurrences of eigen-variables 
of the rules (— >■ V), (3 — >•). 



r ^ A, aA 
r ^ A, aA, dA* 



(“f C'n)) 



where either A = A* or A, A* are congruent, or A, A* differ only by correspond- 
ing occurrences of eigen-variables of the rules (— >■ V), (3 — >■), such formulas A 
and A* are parametrically identical. 



Lemma 1. The rule (— >■ and contraction rules {Ce — f), (— f Ce), (— f C'n) 

are admissible and invertible in 

Proof. The admissibility and invertibility of (— >■ follow from the fact that 

G*j^^ h aA = A /\ aA^ and Theorems 1(b), 2. The admissibility and invertibility 
of contraction rules are obvious. 



Lemma 2 (reduction to a set of indexed CM-sequents). Let S be a GM- 

sequent. Then one can construct i?(S'){t} {S'!, . . . , Sn}, where Vj(l < j < ri) 

is an indexed G M -sequent; {i} is the set of reduction rules; moreover, G\^ h 
S^Gl^^S,, jG{l,...,n}. 

Proof. Using Lemma 1 and bottom-up applying reduction rules from {i}. 



Definition 10 (proper reduction and proper reduction-tree to indexed 
CM-sequents: R*{S) and R*T{S), successful construction of R*T{S)). 

Let R{S) = {Si, . . . , Sn} be the reduction of a GM-sequent S to the set of 
indexed GM-sequents {S'!, . . . , Sn}- Then the set R*{S) obtained from R{S) by 
dropping axioms of is a proper reduction of the GM-sequent S to indexed 
CM-sequents S\, . . . , S„- A derivation consisting of bottom-up applications of 
reduction rules and having a CM -sequent as the root and the set R*{S) as the 
leaves is a proper reduction-tree of S to indexed CM-sequents, and it is denoted 
by R*T{S). Lf R*{S) does not contain an induction-free indexed GM-sequent, 
then we say that the construction of R*T{S) is successful (in notation R*T{S) yf 
E). In the opposite case, we say that the construction of R*T{S) is not successful 
(in symbols R*T{S) = E). 

Lemma 3 (decidability of R*T{S)). Let S be a CM-sequent, then the prob- 
lem of construction of R*T{S) is decidable. Moreover, if G}^^ h S, then 
Vi(G*^ h 5,), where S, G R*{S). 
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Proof. Let S' be a CM-sequent, then using Lemma 2 one can automatically re- 
duce S to a set R{S) of indexed CM-sequents. Afterwards, using the decidability 
of the axioms of from R{S), one can automatically get R*T{S). Then, using 
the syntactical notion of an induction- free indexed CM-sequent, one can auto- 
matically verify whether R*T{S) or R*T{S) =_L. We get from Lemma 2 
that, if Gl^ h S, then Gl^ h S, where S* € R*{S). 



Definition 11 (separation rule: (S)). Let us introduce the following rule 
(which will be applied in the bottom-up manner): 



S+ = ^ n 2 ,aA 

S* = Si,nl,nf 2 ^ ^ ’’ 

where S* is an indexed CM-sequent and Ei,!!) — >■ E2,n) is not an axiom of 
G*j^^; S'^ is the GM-sequent. 



Lemma 4 (invertibility of (S)). Let S* = Ei, Lll,a[ 2 ^ — >■ E2,Il2,oA^ be 
an indexed CM-sequent and Ei,Lll — >■ E2,n) is not an axiom ofG),,. Then 
Gl^ h S* ^ G*^ h ill, ^ il 2 , dA. 

Proof. By proving (using induction on the height) the invertibility of (S) in Gllo 
and using Theorem 2 . 



4 Description of a Decidable Calculus CM Sat 

In this section, the decidable calculus CM Sat for GM-sequents will be described. 
First let us introduce the following rule. 

Definition 12 (structural rule (IT*), parametrically identical GM- 
sequents). CM-sequents S and S' are parametrically identical (in symbols 

5 « S' ) if S, S' differ only by parametrically identical formulas. Let us introduce 
the following structural rule: 



r ^ A 
n, P' A', 0 



(W*), 



where P A fa P' ^ A' . 



Definition 13 (subsumed GM-sequent). We say that a CM-sequent Si sub- 
sumes a C M -sequent S2 or S2 is subsumed by Si (in symbols Si S2) if 
(in a special case, Si = S2). 
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Definition 14 (subsumption rule {Sub'^), subsumption-tree (ST+), ac- 
tive and passive parts of (ST+)). The subsumption rule is the following rule 
(it is applied in the bottom-up manner): 



5 °, 



cO cO 






S\ , . . . , Sj , . . . , Sji 



{Sub'^), 



where S\, . . . , Sn are CM-sequents and Si ip Sj (i € {1, . . . ,j — 1, j 1, ■ . . , n}); 
+ G {0, *}, Sf = 0 if -\- = *, otherwise Sf = Si. 

The sub sumption- tree of C M -sequents Si , . . . , is defined by the following 
bottom-up deduction (and denoted by {ST'^)); 



Ol , . . . , 



s 



1 ) • • 



.,Sn 



(Sub+) 

{Sub~^), 



where the set of CM-sequents M = {S( , . . . , S^.} is such that it is impossible 
to bottom-up apply {Sub'^) to the set M . The sequents from (ST+) which sub- 
sumes some sequents from (ST+) will be called an active part of{ST'^), and the 
sequents of (S'T+) which are subsumed will be called a passive part of (S'T+). 



Definition 15 (resolvent-tree ReT{S) and resolvent Re{S) of CM- 
sequent S). The resolvent-tree of a CM -sequent S is defined by the following 
bottom-up deduction (denoted by ReT{S)): 




<?+ 

s* 



(ST) 

R*T{S), 



where {5'J, . . . , S'^} is the set of indexed CM-sequents; , 5'+} and 

{S'!, . . . , Sk} are the set of C M -sequents. The latter set is a resolvent of a CM- 
sequent and is denoted by Re{S). If R*T{S) =_L (i.e., if the set {S'J, . . . , 5^} 
contains an induction-free C M -sequent) , then we say that the construction of 
ReT{S) is not successful (in symbols ReT{S) =I~). //Vz(l < i < m) S* is an 
axiom, then Re{S) = 0. 



Lemma 5 (decidability of ReT{S)). Let S be a CM-sequent, then the prob- 
lem of construction of ReT{S) is decidable. If G*j^^ h S, then either Re{S) = 0 
or Vz (G*]^^ h Si), where Si G Re{S). Moreover, if S is an induction-free CM- 
sequent, then instead of the calculus G’f^ we have the calculus G* . 

Proof. Analogously as in Lemma 3, using Lemmas 3, 4. 
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By one of the saturation-based paradigms, i.e., “calculus = deductive de- 
cision/semi-decision procedure”, we can consider the procedure ReT{S) as a 
calculus. 

Definition 16 (induction-free CM-sequent S derivable in ReT{S)). An 

induction-free CM-sequent S is derivable in ReT{S) (in symbols: ReT{S) S) 
if Re{S) = 0. 



Lemma 6. Let S be an induction-free CM-sequent. Then ReT{S) h S 4=^ 
G* h S'. 

Proof. Let ReT{S) h S, i.e., Re{S) = 0. It means that all indexed CM-sequents 
S* (1 < t < n) are axioms, i.e., G* h S. Let G* h S, then, using the invertibility 
of reduction rules, we get that all indexed CM-sequents S* (1 < z < n) are 
axioms, i.e., Re{S) = 0. 



Lemma 7 (decision procedure for induction- free CM-sequent). Let S 

be an induction-free CM-sequent. Then the problem derivability in ReT{S) is 
decidable. 

Proof. Follows from Lemma 5. 

To define the main deductive procedure of the proposed decidable saturation- 
based procedure CM Sat let us define some auxiliary notions and prove some 
lemmas that are of own interest. 

Definition 17 (resolvent subformulas of kernel formulas, of a set of 
kernel formulas and of a CM-sequent; parametrically finite set of for- 
mulas). Let ^ = □( A yxaiEi{x) D V 3zajPf{z)) (where ai,aj G {0,-i}, 

i=l j = l 

X = xi,...,xi; z = z\,...,zi{l > 0 ), Ej{x), Pj{z) are some quasi- elementary 
formulas) be a regular kernel formula. Then the resolvent subformulas of A (de- 
noted by RSub{A)) are a set o/ {Pi(&i), . . . , P„(6„)}, where bj = bji,...,bji 
(1 < j < n; I > 0) and any bji (\ < i < 1) is a new variable. Let A = 

m 

□ ( A 'ixaiEi(x) D -<DaE) (where ai G a G {0,-i}; Ei{x) (I < i < n), 

E are some quasi- elementary formulas) be a non-regular kernel formula. Then 
the resolvent subformula of A is the formula of the shape aaE. Let be a set 
of kernel formulas Gli,...,Gl„. Then resolvent subformulas of af2 (denoted by 

n 

RSub{nQ)) are the set U RSub{Ai). Let S = Ei,oG — >■ E 2 ,oAi, . . . ,aAn be 

i—1 

a CM-sequent. Then resolvent subformulas of S (denoted by RSub{S)) are the 

n 

set RSub{oC) U U nAi. R*Sub{S) means the set obtained from RSub{S) by 

i—1 

dropping the truth constant T and merging the formulas that are parametrically 
identical (see Definition 9). Let M be a set of formulas and M* be the set ob- 
tained from M by merging the formulas that are parametrically identical. Then 
the set M is parametrically finite if M* is finite. 
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Lemma 8. Let S = Ei,of2 — >■ S^uA he a CM -sequent. Then the sets 
R* Suh{nfi) and R* Sub{S) are parametrically finite. 

Proof. Follows from Definition 17. 

Definition 18 (saturated CM-sequent). Let S be a non-induction-free CM- 
sequent not containing non-regular kernels. Then S is saturated if S = Ei, afl 
E 2 ,dA and Ei,E 2 Q R* Sub{nfi). Let S he a CM-sequent containing non- 
regular kernels and let Re{S) = {51, . . . , S'„}. Then Vz (1 < i < n) Si is a 
saturated CM-sequent. 

Lemma 9 (reducing to saturated CM-sequents). Let S = Ei,aL2 
E 2 , 0 ° A (where a°A € {0,a°A}) he a non-induction-free CM-sequent. Then 
either S does not contain a non-regular kernel and 5 = ifi, — >■ E 2 , oA (where 

E\tS 2 C R* Sub{nL2)) or ReT{S) = _L, or Re{S) = 0 or S contains non-regular 
kernels and Re{S) = |5i, . . . , 5„}, where Vz(l < i < n) Si is a saturated CM- 
sequent and Si = Eli, an — f E 2 i,ad,a'^ A, where En, E2i,a9 C R*Sub{an). 

Proof. Follows from the definitions of Re{S) and R*Sub{an). 



Lemma 10. Let S be a non-induction-free CM-sequent. Then the problem of 
reduction of the CM-sequent S to a set of saturated CM-sequents is decidable. 

Proof. Follows from Lemma 5. 



We define now the main deductive tool of the proposed decision saturation- 
based procedure CMSat, which will be applied to a saturated CM-sequent. 

Definition 19 {k-th resolvent-tree Re^T{S) and /c-th resolvent Re^{S) 
of a saturated CM-sequent S ). Let S = Ei,an — >■ E2,a°9,a° A (where 
Ei,E2,a°9 G an and a°9 G {0,a9}, a°A G {0,a°A}, a°9,a°A ^ 0 ) 
he a saturated CM-sequent. Then Re°{S) = Re°T{S) = S. Let Re^{S) 
= |5i,...,5„}, then Re'^+^T{S) and Re^+^{S) are defined by the following 
bottom-up deduction: 



Re^+^{S) 

i?+e'=+i(5) 



\ST*) 



-{ST) 



0 Re\S) 

i—O 



. . . ^^^ReT{S„) 



Si 



Sn 



Re’‘(S) 



The bottom-up application of {ST) reduces the set U Re{Si) to the set 

i—1 

i?+e^“*'^(5) of saturated CM-sequents. The bottom-up application of (ST*) re- 
duces the set R~^e^~^^{S) U U Re^{S) to the set Re^^^{S) which will he called 

i—0 
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a k + 1-th resolvent of a saturated C M -sequent S. Moreover, the members 

k 

of U Re^{S) are the active part of the application of {ST*) and members of 

i—0 

R+e^+^{S) are the passive ones. If3i such that ReT{Si) = _L, then Re^~^^{S) = 
_L and Re^^^T{S) = _L, i.e., the construction of Re^~^^ {S)T{S) is unsuccessful. 
The set Re'^~^^{S) is empty in two following cases: (1) ^i (1 < i < n) Re{Si) = 0 
or (2) the bottom-up application of (ST*) in Re^'^^T{S) yields an empty set. 

The notation Re^{S) ^ T (k Glu) means that the construction of Re^{S) is 
successful for all k G u>. 

Now we establish the main property of the procedure Re^{S). 

Lemma 11 (loop property). Let Re’^(S) yf _L (k G lo). Then there exists a 
finite natural number p such that p < |i?*S'u6(5')| and Re^{S) = 0. 

Proof. Let S = Si, all -g S2,a°0,a° A and Re^T{S) ^ T {k G oS). Then 
from the dehnition of Re^{S) it follows that either Vt(i > 1) Re^{S) = 0, or 
Re'‘{S) = {^i, . . . , S'„}, where Si {1 < I < n) = Sn, afl -G S 12 , n°6i, n° Ai, 
where Sn, S 12 , a°9i G RSub(afl) and a°Ai C aA. Lemma 8 implies that the set 
RSub{S) is parametrically hnite. Therefore, there must be a hnite number I such 
that for all i, if Si G R*e'‘{S), then 3j (0 < j < / — 1) such that if Sj G Re^{S), 
then Sj ^ Si, i.e., Re^S) = 0. It is easy to verify that I < |. 

Lemma 12 (decidability of relation ReP{S) = 0). The relation 

Re^{S) = 0 is decidable. 

Proof. Follows from Lemmas 5, 11. 

Definition 20 (calculus CMSat, CM-sequent derivable in CMSaf). Let 

S be an induction-free CM -sequent. Then the calculus CMSat consists of proce- 
dure of construction ReT{S) (see Definition 15). Let S be a non-induction-free 
CM -sequent S, then the calculus CMSat consists of two procedures: (1) a pre- 
liminary procedure of reduction S to a set of saturated CM-sequents and (2) 
the procedure of construction Re^{S). An induction-free CM -sequent S is deriv- 
able in CMSat (in symbols CMSat F S'j if Re{S) = 0, in the opposite case, 
CMSat'P S. A non-induction-free CM-sequent S is derivable in CMSat, if ei- 
ther Re{S) = 0 or 1) R{S) = {S'!, . . . , (i.e., if S can be reduced to saturated 

CM-sequents Si, . . . , Sn) and 2) \/i (1 < t < n) Re^Sf) = 0, in the opposite 
case, CMSafP S. 

Theorem 3 (decidability of CMSat). The calculus CMSat is decidable for 
any CM-sequent. 

Proof. Follows from Lemmas 5, 12. 

Let us introduce some notions that will be used in a so-called invariant cal- 
culus CMLN (see below). 
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Definition 21 (saturation set, decomposition of saturation set, nuclear 
of decomposition of saturation set). Let S be a saturated CM -sequent and 

n— 1 

i?e"(S') = 0 . Let us construct a set U Re^{S). Then a set, obtained from the set 

i—0 

n— 1 

U Re^{S) applying the rule (Sub), is a saturated set of the saturated sequent S 

i—0 

and is denoted by Sat{S}. A set {S'at{5'}} is a decomposition of the set Sat{S}, 
if: (1) Sat{S} = US'af{S'}; (2) \fij{SaT{S} (1 SaP {S}) = 0 ; (3) if Si, G 

SaP{S}, then S'! = Si,aL2 S 2 ,aA,a° Ai; S 2 = Sl,af2 n°Z\ 2 ; 

i.e., S\, S 2 have a common succedent member aA, which is a nuclear of SaP{S}. 

Now let us give some examples which demonstrate the main features of the 
presented decision procedure CM Sat. 

Example 1. (a) Let S = 'izP{z), nfl — >•, where P is a one-place predicate symbol, 
□12 = D(Va;P(x) D 3xP^{x)), □(VyP(i/) D -iD-iP) and if is a quasi-elementary 
formula of the shape \/yP{y). So, S is the CM-sequent containing one regular 
kernel and one non-regular kernel. First, let us construct a proper reduction 
of the CM-sequent S to the set of indexed CM-sequents, i.e., let us construct 
R*{S) and R*T{S) (see Definition 10). Bottom-up applying reduction rules, from 
S' we get R*{S) = {Si, S 2 }, where Si = VzP(z), P^( 6 i), S 2 = VzP(^), 

P^(& 2 ), — >• G-iVj/P^(j/). Since the sequent Si is not an axiom, R*T{S) = _L 

and CM Sat S. 

(b) Let S = yxyP{g{y), f{x, h{x),y)), nQ — >■ dPs, where nQ = g(Pi D 
3uE\{u)), g(P 3 D -'□P 4 ), where Pi is a quasi-elementary formula of the 
shape 3xwzP{x, f{g{z),w, z)); P 2 (u) is a quasi-elementary formula of the shape 
\/vP{u,v); P 3 is a quasi-elementary formula of the shape 3xyP{x,y); P 4 is a 
quasi-elementary formula of the shape 3uvP(u,v); P 5 is a quasi-elementary 
formula of the shape 3uiViP{ui,vi). So, S is the CM-sequent containing one 
regular kernel and one non-regular kernel. Let us perform a preliminary proce- 
dure CMSat, i.e., reduce the given CM-sequent to the set of saturated ones, 
and construct Pe(S). Since the system of terms g{y) = x; f{x,h{x),y) = 
f{g{z),w,z) is unifiable, bottom-up applying the reduction rules and rules (S'), 
{Sub), we get Re{S) = Si = 'ivP{b,v), afl — >• a3uvP{u,v). Next, construct 
sets R* Sub{uQ) and R*Sub{S). By definition, R*Sub{nf2) = R*Sub{S) = 
{yuP{b,v),a3uvP{u,v)} (and \R* Sub{S)\ = 2). Thus, Si is a saturated CM- 
sequent and we can calculate Pe^(Si). By definition, Pe°(Si) = Si. Bottom- 
up applying the reduction rules and rules (S), {Sub), we get R*e^{S) = 
S 2 = 'ivP{bi,v), gC — >■ n3uvP{u,v). Since Si S 2 , we have Pe^(Si) = 0, 
Sat{Si} = Si and CMSat h S. 

(c) Let S = P{b),a[2 — >■ a3uP{u), o3vQ{v), where aC = a{->Ei D 
-■T1),g(-'P 2 D -■pi),G(Pi D (3 mP^(m) V 3uQi(u))), g(P 2 D {3uiP^{ui) W 
3wi(5(ui))); Pi = 3xP{x)] P 2 = 3yQ{y) and T is the truth constant. 
So, S is a CM-sequent containing four regular kernels. Let us construct 
R*Sub{ofI) and R*Sub{S). By definition, R*Sub{aC) = {P{bi),Q{b 2 )} and 
R*Sub{S) = R*Sub{nn)U {g3mP(m), □ 3uQ(u)} (and |P*Sm&(S)| = 4). Since 
P{b) G R* Sub{ai7), we get S is a saturated CM-sequent. Therefore we can start 
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calculating i?e^(S'). By definition, i?e°(S'). It is easy to verify that R*e^{S) = 
{S'! = P{bi), ufi — >■ a3uP{u), n3v(5(u); S 2 = Q(f> 2 ), — >■ □3MP(t6), □3uQ(v)}. 

Since S ^ S'!, Re^{S) = 82 - Analogously we get R*e^{S) = = P(& 3 ), — >• 

□3uP(u), □3uQ(^^); S 4 = Q{b 4 ),on — >■ 3uP(m), □3w(5(u)}. Since S >p S 3 , 
S 2 ^ S 4 , we have Re‘^{S) = 0, CMSat h 5 and Sat{S} = {AS'z}. 

In order to justify the decidable calculus CMSat, we introduce a so-called 
invariant calculus CMIN. First we introduce some simple calculi. 

Definition 22 (calculi Log and G+). A calculus Log is obtained from the 
calculus G* (see Definition 1) by dropping the rule (□ — >■). A calculus G+ is 
obtained from the calculus G* by adding: 

1 ) the following rule: 

P ^ A,A;P ^ A,dA^ . ,, 



2) traditional invertible logical rules (A — >■), (V — >■). 



Definition 23 (invariant calculus CMIN). The invariant calculus CMIN 
is obtained from the calculus G+ by adding the following rule: 



P -)■ A,P,I ^ R-,I -)■ A 
P ^ A,oA 



H □) 



The rule (— >■ □) satisfies the following conditions: 

1) the conclusion of (— >■ □), i.e., the sequent S = P ^ A,aA is such 

that (a) S is a saturated CM-sequent; (b) S € SaT{S} = —>■ 

TTii, dA; . . . ; nf? —>■ iTi„, nA}, where aA is the nuclear of 

SaT{S} (see Definition 21); Aij € { 0 ,oAij} (1 < j < n); 

2) I = y A -(Vi7„)^ A A (□f?)''), where QP = 

i=i 

QEi, . . . ,QEn, if P = El, . . . , En; Q € {Vai, 3x} (i.e., all the eigen-variables 
are correspondingly hounded in Eij,PIij); P'^{r^) means the conjunction (dis- 
junction, respectively) of formulas from P . 

(3) the left premise of (— >■ □), the sequent S\ = P -y A, I is such that 
Log h Si; 

(4) the middle premise of (— >■ □), the sequent S 2 = I ^ d is such that 
G+ h ^2/ 

(5) the right premise o/(— >■ □), the sequent S 3 = I ^ A is such that CMIN h 
^3. 



Lemma 13. The problem of finding the invariant formula I in the rule (— >■ □) 
is decidable. 



Proof. Follows from Lemma 12. 
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Example 2. Let S be the saturated CM-sequent from Example 1(c), i.e., 
S = — >■ □3uP(m), □3u(5(u), where ufi = D 

a{-^E2 D -iT^), □(El D (3uP^(M)V3v(5^(a;)), d(E2 D (3wiP^('Ui)V3wi(5^(ui))). 
From Example 1(c) we get that Sat{S} = { 8 , 82 }, where 8 = Q{b 2 ),D ^2 — >• 
a3uP{u),n3vQ{v). Therefore, by definition, 8 at^{ 8 } = {5,52} with two alter- 
native nuclei: (1) d3P(u) or (2) d3uQ(u). Let us take case (1). Then, by defi- 
nition, the invariant formula has the following shape / = (3xP{x) V 3yQ(y)) A 
-'□3uQ(f) A (□17)'^. It is easy to verify that 
Log \- P{b),o{2 ^ I,a3vQ{v) (1); 

G+ h / ^ /I (2); 

G+^I^3uP{u) (3). 

Applying (— >• □) to (1), (2), (3), we get CMIN h 5. We get the same result 
when we take the formula g3u(5(z;) as a nuclear. 

In the same way as in [7] we get 

Theorem 4. Let 8 be a saturated CM-sequent. Then CM 8 at h 5 <1=^ 

CMIN h 5 ^ h 5. 



Lemma 14. The calculus CMIN is decidable. 
Proof. Follows from Theorems 3, 4. 



Theorem 5. (a) The calculi CM 8 at and CMIN are sound and complete for 
the class of saturated CM-sequents. 

(b) The calculus CM 8 at is sound and complete for the class of CM-sequents. 

Proof. Point (a) follows from Theorems 2, 4. Point (b) follows from Theorems 2, 
4 and Lemma 9. 
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Abstract. We show how to combine the standard tableau system for the 
basic description logic ACXL and Wolper’s tableau calculus for proposi- 
tional temporal logic PTL (with the temporal operators ‘next-time’ and 
‘until’) in order to design a terminating sound and complete tableau- 
based satisfiability-checking algorithm for the temporal description logic 
PTLacc of [20] interpreted in models with constant domains. We use the 
method of quasimodels [18,16] to represent models with infinite domains, 
and the technique of minimal types [11] to maintain these domains con- 
stant. The combination is flexible and can be extended to more expressive 
description logics or even to decidable fragments of first-order temporal 
logics. 



1 Introduction 

Temporal description logics (TDLs) are knowledge representation formalisms 
intended for dealing with temporal conceptual knowledge. In other words, TDLs 
combine the ability of description logics (DLs) to represent and reason about 
conceptual knowledge with the ability of temporal logics (TLs) to reason about 
time. A dozen TDLs designed in the last decade (see e.g. [15,14,2,20,3,10] and 
survey [1]) showed that the equation TDL = DL + TL may have different, often 
very complex solutions, partly because of the rich choice of DLs and TLs, but 
primarily because of principle difficulties in combining systems; see [7]. With 
rare exceptions, the work so far has been concentrated on theoretical foundations 
of TDLs (decidability and undecidability, computational complexity, expressive 
power). The investigation of ‘implementable’ algorithms is still at the embryo 
stage, especially for the TDLs with non-trivial interactions between their DL and 
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TL components. The problem we are facing is as follows: is it possible to combine 
the existing implementable reasoning procedures for the interacting DL and TL 
components into a reasonably efficient (on ‘real world problems’) algorithm for 
their TDL hybrid? As the majority of the existing reasoning mechanisms for DLs 
are based on the tableau approach, a first challenging step would be to combine 
a tableau system for a DL with Wolper’s tableaux [17] for the propositional 
temporal logic PTL. 

The first TDL tableau system was constructed by Schild [14], who merged the 
basic description logic AUC with PTL by allowing applications of the temporal 
operator U (until) and its derivatives only to concepts. For example, he defines 
a concept Mortal by taking 

Mortal = Living_being □ (Living_being U □->Living_being) , 

where □ means ‘always in the future.’ The resulting language is interpreted 
in models based on the flow of time (N, <) and, for each n G N, specifying 
an A£C-model that describes the state of the knowledge base at moment n. 
Schild obtains his sound, complete and terminating tableau system (for checking 
concept satisflability) simply by putting together the tableau rules of ACC and 
PTL. The reason behind this ‘trivial’ solution is that, in Schild’s logic, there is no 
actual interaction between the temporal operators of PTL and the constructors 
of ACC\ the logic is the fusion or independent join of its components. 

A more sophisticated combination PTL_4£c of ACC and PTL allowing appli- 
cations of temporal and Boolean operators to both concepts and TBox axioms 
was constructed in [20]. Using PTL^£C> one can express statements like ‘in all 
times all living beings are mortal’ or ‘living beings will never die out completely:’ 

□ (Living_being C Mortal), □0-i(Living_being = T), 

where O means ‘some time in the future.’ The degree of interaction between 
the DL and TL components in PTL_4£c depends on the ‘domain assumption’ 
the intended models comply with. A tableau system for PTL_4£c interpreted 
in models with expanding ACC domains (which means that when moving from 
earlier moments of time to later ones, the domains of A/lC-models can get larger 
and larger, but never shrink) was designed in [16]. The interaction between the 
components becomes even stronger if we consider models with constant domains, 
where an introduction of a domain element at moment n forces us to introduce 
the same element at all previous moments as well. This makes the problem 
of constructing tableaux for PTL_4£c with constant domains considerably more 
difficult. 

The choice of the domain assumption — expanding, varying, decreasing, or 
constant — depends on the knowledge to be represented. One can argue, for in- 
stance, whether the domain element representing a living being A in a model 
exists before A’s birth or after A’s death. However, in many applications such 
as reasoning about temporal entity relationship (ER) diagrams [2,3], expanding 
domains do not suffice and must be replaced by constant ones. Apart from being 
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appropriate for applications, the constant domain assumption is the most gen- 
eral case in the sense that reasoning with expanding (or varying) domains can 
be reduced to reasoning with constant domains (see e.g. [20]). 

The main aim of this paper is to design a terminating, sound, and complete 
tableau system for checking satisfiability of PTL_acc~ formulas in models with con- 
stant domains. 

This is achieved by 

— combining (in a modular way) the standard tableaux for ACC with Wolper’s 
[17] tableaux for PTL, 

— using so-called quasimodel representations of constraint systems, and 

— using so-called minimal type representations of domain elements introduced 
in subsequent states. 

Quasimodels [18,19,20] are abstractions of models representing elements by their 
types and the evolution of elements in time by certain functions called runs. As 
was shown in [16], quasimodels make it possible to cope with PTL^£C models 
having infinite ACC domains (an example showing that PTL_4£c does not have 
the finite domain property can be found in Section 2). The concept of ‘minimal 
partial types’ is the main new idea of this paper which is used to maintain the 
ACC domains constant. 

Although the formula-satisfiability problem for PTL^£C is rather complex — 
as is shown in [3], it is Exp S PACE-complete — we hope that the tableau system 
constructed in this paper will lead to a ‘reasonably efficient’ implementation of 
the PTL_4 £c reasoning services. However, in order to achieve an acceptable run- 
time behavior, it is still necessary to devise suitable optimization strategies for 
the algorithm. We believe that such strategies can be found, since, as shown in 
e.g. [9], related tableau algorithms are amenable to optimization. 

It is to be noted that the developed approach can be used to design tableau 
algorithms for other combinations of description and modal logics (in particular, 
temporal epistemic logics of [6]). For instance, [11] gives a solution to the open 
problem of Baader and Laux [4] by constructing tableaux for their combination 
of the modal logic K with ACC interpreted in models with constant domains. 

The paper is accompanied by a technical report [12] containing full proofs of 
all theorems. 



2 Basic Definitions 

We begin by introducing the temporal description logic PTL^£C of [20]. 

Let Nc = {Co, Cl,...}, Nr = {Ro,Ri,...}, and Nq = {ao,ai,...| be 
countably infinite sets of concept names, role names, and object names, respec- 
tively. PTL_4£c- concepts are defined inductively: all the Cj as well as T are 
concepts, and if C, D are concepts and R G Nr, then C □ C, -iC, 3R.C, QiC, 
and CUD are concepts. 
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PTL_Acc-formulas are defined as follows: if C, D are concepts and a,b € Nq, 
then C = D, a : C, and aRb are atomic formulas; and if tp and if) are formulas, 
then so are -up, (f Atp, Q)ip, and tplA-ip. 

The intended models of PTL_4£c are natural two-dimensional hybrids of 
standard models of ACC and PTL. More precisely, a 9J\-j,cc-fnodel is a triple 
971 = (N, <, I), where < is the standard ordering of N and I a function associat- 
ing with each n G N an M£C-model /(n) = i^A, Rq^^\ . . . , Cg^”\ . . . , ag^"\ . . ■'j, 

in which A, the (constant) domain of 971, is a non-empty set, the are binary 
relations on A, the subsets of A, and the are elements of A such that 
Qi = , for every n,m G n. 

(Note that in the given definition, the object names are assumed to be global, 
while the concept names are interpreted locally. Neither of these assumptions is 
essential; in particular, global concepts can be defined via local ones and U.) 

The extension of a concept C in 971 at a moment n is defined in the 

following way: 

jRn) ^ 

(Cnl7)^(") = 

{-^CyA) = 

(3i?.C)^(”) = {dGA\3d' G 

{CUDY^^I = {dG A\3m>n {dG k'ik {n<k<m^dG (y/(fc)))}; 

The truth-relation 971, n ^ for the Boolean operators is standard and 
971, n h C' = 17 iff = Z7^(”); 

97l,nha:C'iff G 
971, n h aRb iff 

971, n \= ipUip iff 3m > n (97t, m\= ip & Vk {n < k < m ^ 97t, k \= (p)); 

'a h Oc iff n -I- 1 1= V?- 

The only reasoning task we consider in this paper is satisfiability of PTL^£C~ 
formulas, a formula (p being satisfiable if there are a model 971 and a moment 
n G N such that 971, n \= ip. Other standard inference problems for PTL^£C — 
concept satisfiability, subsumption, ABox consistency, etc. — can be easily re- 
duced to satisfiability of formulas. 

There are two main difficulties in designing a tableau system for PTL_ 4 £c. 
First, as was mentioned in the introduction, there exist formulas satisfiable only 
in models with infinite domains. For example, such is the conjunction of the 
formulas 

□-((CnO-C') = T), □(^C'cn-C), 

where DC = -'{TU-'C) and T = -iT. To tackle this difficulty, we employ the 
standard tableaux for ACC for constructing finite representations of infinite 
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models and keep track of the development of their elements in time by using 
quasimodels as introduced in [18,20,16]. 

The second difficulty is that at moment n + 1 the A££ tableau algorithm 
can introduce an element which does not exists at moment n. To ensure that all 
elements always have their immediate predecessors, at each time point we create 
certain ‘marked’ elements satisfying as few conditions as possible, and use them 
as those predecessors if necessary. 

3 Constraint Systems 

In this section, we introduce constraint systems which serve a two-fold purpose. 
First, they form a basis for defining quasimodels, which, in contrast to [20], are 
defined purely syntactically. Second, constraint systems are the underlying data 
structure of the tableau algorithm to be devised. Intuitively, a constraint system 
describes an ^£C-model. 

In what follows, without loss of generality we assume that all equalities are of 
the form C = T . {C = D is clearly equivalent to (-■(C'n-'_D)n-'(I?n-iC')) = T.) 
Often we shall write C yf T instead of -■(C' = T). 

Constraint systems are formulated in the following language Lc- Let C be a 
fixed countably infinite set of (individual) variables. We assume V to be disjoint 
from the set Nq of object names. Elements of E U Nq are called Lc-terms. If 
1 ^ is a PTL^£C-formula, C a concept, R a role, and x,y are Lc-terms, then Lp, 
X : C, and xRy are called Lc-formulas. 

We assume that V comes equipped with a well-order <y. Let X be a non- 
empty subset of V. Then min(X) denotes the first variable in X with respect 
to <y. Variables may occur in constraint systems either marked or unmarked; 
certain formulas may occur U-marked or lA-unmarked. As we said above, marked 
variables are used to deal with constant domains. ZY-markedness will be explained 
after the saturation rules have been introduced. 

Definition 1. A constraint system S' is a finite (non-empty) set of Lc-formulas 
such that 

— each variable in S is either marked or unmarked, 

— each formula in S of the form (pliip or x : {CUD) is either U-marked or 

U -unmarked, 

— S contains min(V) : T. 

We will say that a constraint system S is saturated if it satisfies a number of 
closure conditions. With a few exceptions, these conditions require that if S 
contains a formula of a certain form, then S contains some other formulas 
composed from subformulas and subconcepts of (p (possibly using additional 
negation and Q)- Lor example, S is closed under conjunction if whenever S 
contains V’iLV’ 2 ) then it contains both conjuncts and ip 2 as well. We formulate 
the closure conditions as the saturation rules in Fig. 1-3. Later these rules will 
also be used as rules of our tableau algorithm. A constraint system S is called 
saturated if none of the saturation rules can be applied to it. 
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^£C-rules for formulas 

S {y?}ug if S — »A {<p,ip}uS if 

-'-'(p G S and ^ S ifi Aip G S and {ip, V’} 2 >5' 

S {^gjUg if 

Alp) G S, -'p 0 S, and -yp) 0 S 
9 = p or 9 G= Ip 

Temporal rules for formulas 

S — {Q^y} U5' if S -^u XPS if 

-<Q)P G S and Q-'P ^ S pUtp appears ^/-unmarked in S 

X = {tp} or X = {p, QipUp})} 
pUip is ^/-marked in X U S 

S — XUS if 

-’(pHtp) G S, {^Ip,-'P} g S, and {-•ip,0^{pUip)} g S 
X = {-^'tp,^p} or X = {^p!,0^{pUip)} 



Fig. 1. Saturation rules for formulas. 



A few remarks below will help the reader to understand the rules. As the 
temporal part of our tableaux is based on Wolper’s [17] algorithm for PTL, 
the temporal saturation rules resemble those of Wolper’s. Note also that the 
saturation rules — — >U) — — ^-in, — >Uc and — >^uc are disjunctive: 
they have more than one possible outcome. In this section, it is convenient to view 
these rules as nondeterministic. Later, when the saturation rules are regarded 
as tableau rules, we will apply them deterministically, i.e., consider all of their 
possible outcomes. Unless otherwise stated, we assume rules to introduce U- 
unmarked formulas. Intuitively, W-markedness is needed to ensure that the — >u 
and — >uc rules are applied exactly once to each formula pU-ip and x : CUD, 
respectively. For example, we must ensure that the — >u rule is applied (once) 
to pUip even if the constraint system under consideration already contains p 
and Q{pU'tp). This is required to make the tableau algorithm complete (see [17, 
16] for an example and a more detailed discussion). 

As was already noted, marked variables are needed to cope with constant 
domains. For now, we just observe that the disjunctive rules treat marked and 
unmarked variables differently. Intuitively, in case of marked variables it is not 
sufficient to consider only one of the possible outcomes of the disjunctive rule 
application per constraint system, but we must additionally consider both pos- 
sible outcomes together. For example, if we have S' = {w : EUF,v : -i(C □ D)} 
and V is marked in S, then we should consider not only the obvious saturations 
Si = S U {u : -iC} and S2 = S U {w : ^D}, but also 



S 3 = {v: EUF, V : UD),v:^C,v' : EUF, v' : UD),v' : ~^D}, 
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^ZlC-rules for concepts 

S — {x : C} U 5' if 
X : € S and x:C^ S 

S — »n {x:C,x:D}US if 
X : C n D £ S and {x : C,x : D} g S 

S XUS if 

X : ^{C n D) € S, x : ^ S and X : -^D ^ S 

X = {x : ~'C} or X = {a; : or 

X marked in S and X = (copy(S', a:, u) U {a; : -iC, v : “'D}) 
where v is marked in X U S and the first new variable from V 

S — {a; : C} U5' if 
C = T £ S, X occurs in S', and x : C ^ S 
S — {y : ^C} US if 
X : -3R.C £ S, xRy £ S, and y : -iC ^ S 

Temporal rules for concepts 

S {x : O^C} U S if 

X : ^OC £ S andx: Q^C ^ S 

S ^uc X U S if 

X : CUD appears 77-unmarked in S 

X = {a: : D} or X = {a: : C, x : 0{CUD)} or 

X marked in S and X = (copy(S, a:, u) U {a; : D, u : C, i> : Q)(CUD)}) 
where v is marked in X U S and the first new variable from V 
X : CUD and v : CUD (if introduced) are ^/-marked in X U S 

S — X U S if 

X : ^(CUD) £S,{x: ~^D,x : ^C} g S, and {x :~^D,x: Q^CUD)} g S 

X = {a: : -nD, a: : ^C} or X = {x : ~<D, x : Q-^iCUD)} or 

X marked in S and X = (copy(S, x, ii)U{a; : ~iD, x : -iC, v : ~^D, v : Qi^(CUD)}) 
where v is marked in X U S and the first new variable from V 



Fig. 2. Non-generating saturation rules for concepts. 



S {n : ^CjUS if 

C g T € S and there exists no y with y : -iC £ S 
V is the first new variable from V 
S — {v : C,xRv}US if 

X : 3R.C £ S, there is no y such that {xRy, y : C} g S and x is not blocked in S 
by an unmarked variable; v is unmarked and the first new variable from V 



Fig. 3. Generating saturation rules. 
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where, v is marked in S\, S2, S3 and v' is marked in S3. In S3, we created a 
‘marked copy’ v' of v and saturated v in one possible way and v' in the other. 
In the formulation of the rules, copies are made by using copy(S', v, v') which 
denotes the set {v' : C \v : C & S'}, where v is marked and v' is a fresh variable 
(not used in S). Note that by definition of Lc“formulas, marked variables do not 
occur in complex formulas such as x : C Ax : D and thus such formulas need not 
be considered for copy. We generally assume that copies preserve W-markedness: 
in the example above, v' : EUF is ZY-marked in S3 iff w : ElAF is ZY-marked in S. 

To ensure termination of repeated applications of the saturation rules, we use 
a ‘blocking’ technique, c.f. [5]. Blocked variables are defined as follows. For now, 
assume that each constraint system is equipped with a strict partial order <C on 
the set of terms. Say that a variable v va a constraint system S is Mocked by a 
variable v' in S if u' <C u and {C \ v : C & S} ^ {C \ v' : C & Sj. Later, when we 
consider sequences of constraint systems obtained by repeated rule applications, 
<C will denote the order of introduction of terms. Note that only variables, rather 
than object names, may block terms. Also, only variables can be blocked. 

A constraint system S is said to be clash-free if it contains no formulas -■T 
and X : -iT and neither a pair of the form x : C , x : ~<C , nor a pair of the form 
(f, -i(p. We write S — S' to say that the constraint system S' can be obtained 
from S by an application of the saturation rule — >»■ 

Let So, . . . , S„he a sequence of constraint systems such that, for every i < n, 
there is a saturation rule — for which Si — iSi+i and in case — is a 
generating rule, no non-generating rule is applicable to Si (where non-generating 
rules are from Fig. 1 and 2 while generating rules are from Fig. 3). Then we say 
that So,...,S„ is built according to the saturation strategy. If this is the case 
and no saturation rule is applicable to S'„, then we call S'„ a saturation of S'q. 

4 Quasimodels 

As was already said, PTL_4£c does not have the finite domain property, and 
so our tableau algorithm constructs abstractions of models, called quasimodels, 
rather than models themselves. 

Quasimodels are based on the idea of concept types. A concept type is simply 
a set of concepts that are ‘relevant’ to the tested formula and satisfied by an 
element of the domain. The ‘fragment’ of relevant concepts and formulas is 
defined as follows. Let he a set of formulas. Denote by Sb{<P) the set of all 
subformulas of formulas in <P, by ob{<P) the set of all object names that occur 
in <P, by rol{<P) the set of all roles in <P, and by con{<P) the set of all concepts 
in If is a unary operator, say, -■ or 0> then is the union of <P and 

I ip € E}. The fragment Fg{<P) generated by is defined as the union of the 
following four sets: ob{E), rol{<l>), Q{->con{<PD {T})) and Q{-'Sb{<l^ D {T})). 

Roughly, a quasimodel is a sequence (S'n | n G N) of saturated constraint 
systems that satisfies certain conditions which control interactions between the 
Sn and ensure that quasimodels can be reconstructed into real models. Unlike 
standard tableaux, where a variable usually represents an element of a model, a 
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variable in a quasimodel represents a concept type. More precisely, if a constraint 
system contains a variable v, then the corresponding M/lC-models contain at least 
one — but potentially (infinitely) many — elements of the type represented by v. 
As our PTL_ 4 £C“inodels have constant domains, we need some means to keep 
track of the types representing the same element at different moments of time. 
This can be done using a function r, called a run, which associates with each 
n G N a term r(n) from S'„. Thus r(0),r(l), . . . are type representations of one 
and the same element at moments 0, 1, ... . 

We are in a position now to give precise definitions. Fix a PTL_ 4 £c-formula i?. 

Definition 2. A quasiworld for -d is a saturated clash- free constraint system S 
satisfying the following conditions: 

— {a|3C (a : C) G S'} = ob{i)), 

— con{S) C Fg{-d) and rol{S) C Fg^-d), 

— for every formula G S, if is a PTL_ 4 £C“formula then tp G Fg^id), 

— all variables in S are unmarked. 

One should not be confused by that all variables in quasiworlds are unmarked. 
Marked variables are — as we shall see later on — important for the construction of 
a quasimodel. After the construction, marked variables can simply be ‘unmarked’ 
(note that this operation preserves saturatedness of constraint systems). 

Definition 3. A sequence Q = {Sn | n G N) of quasiworlds for -d is called a 
^-sequence. A run in Q is a function r associating with each n G N a term r(n) 
from Sn such that 

— for every m G N and every concept C, if {r{m) : Q)C) G Sm then we have 
{r{m+ 1) : C) G Sm+i, 

— for all TO G N, if (r(m) : CUD) G Sm then there is k > m such that 
(r{k) : D) € Sk and (r(z) : C) € Si whenever m < i < k. 

Definition 4. A -d-sequence Q is called a quasimodel for -d if the following hold: 

— for every object name a in Q, the function defined by ra{n) = a, for all 
n G N, is a run in Q, 

— for every n G N and every variable v in S'„, there is a run r in Q such that 
r(n) = V, 

— for every n G N and every Q)(p G Sn, we have (p G 5'„+i, 

— for every n G N and every {pUip) G S'„, there is to > n such that ip G Sm 
and If G Sk whenever n < k < m. 

We say that d is quasi-satisfiable if there are a quasimodel Q = {Sn | n G N) for 
■d and n G N such that d G Sn- 



Theorem 1. A PTL_Acc~formula d is satisfiable iff d is quasi-satisfiable. 




130 



C. Lutz et al. 



5 The Tableau Algorithm 

In this section, we present a tableau algorithm for checking satisfiability of 
PTL_ 4 £C“formulas in models with constant domains. Before going into techni- 
cal details, we explain informally how quasimodels for an input formula are 
constructed and, in particular, how marked variables help to maintain constant 
domains. 

Intuitively, marked variables represent so-called ‘minimal types.’ If a con- 
straint system S contains marked variables V\, . . . ,Vk then every element of an 
ACC-model corresponding to S is described by one of the Vi. It should now be 
clear why the disjunctive saturation rules must be applied in a special way to 
marked variables. Consider, for example, the — rule and assume that there 
is a single marked variable Vm in S and that Vm '■ ~'{C r\ D) £ S. In the context 
of minimal types, this means that every element in corresponding A/lC-models 
satisfies -■(C n D). From this, however, it does not follow that every element 
satisfies or that every element satisfies ^D. Hence, the — rule cannot 
be applied in the same way as for unmarked variables. 

Here is a simple example illustrating the construction of quasimodels with 
minimal types. Consider the formula 

d = ((^(OC n 0-C')) = T) A a : Q'^R-C- 

With this formula we associate the initial constraint system S.& = {'d,Vm '■ T} 
containing -d and a single marked variable Vm- By applying saturation rules, 
we obtain then the constraint system So = {a : Q3R.C,Vm ■ OC,v'^ : 0“'C} 
(slightly simplified for brevity) that describes the ACC-model for time moment 0. 
The constraint system for moment 1 is {a : 3R.C, v\ : C,V 2 ■ ~<C, Vm -C} (where 
Vm is the only marked variable) which can then be extended to the system 
S'! = {a : 3R.C,Vm ■ T,ui : C,V 2 ■ ~'C,aRv,v : C} by the saturation rules. 
Note that we introduced a new (unmarked) variable v. Every element d which is 
of type V at moment 1 must — according to the constant domain assumption — 
also exist at moment 0. But what is the type of d at that moment (i.e., the 
‘predecessor type’ of d at 1)? By the definition of minimal types, we must only 
choose among marked variables. So either d is of type Vm at 0, which means that 
we must add u : C to or c? is of type at 0, and so we must add v : ~^C 
to S\. The former choice yields an (initial fragment of a) quasimodel, while the 
latter leads to a clash. For a more detailed discussion we refer the reader to [11]. 

We can now define the tableau algorithm. In general, tableau algorithms try 
to construct a (quasi)model for the input formula by repeatedly applying tableau 
rules to an appropriate data structure. Let us first introduce this data structure. 

Definition 5. A tableau for a PTL^£C-formula i? is a triple Q = (G, -<,l), where 
(G, ^) is a finite tree and I a labelling function associating with each g G G a, 
constraint system l{g) for d such that = {i?} U {min(E) : T} U {a : T | a G 
ob{d)} is associated with the root of Q, where min(H) is marked and d is 14- 
unmarked if it is of the form ipU'ip or x : {CUD). 
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To decide whether -d is satisfiable, the tableau algorithm for PTL^£C goes 
through two phases. In the first phase, the algorithm starts with an initial tableau 

and exhaustively applies the tableau rules to be defined below. Eventually we 
obtain a tableau G to which no more rule is applicable; it is called a completion 
of Gs- In the second phase, we eliminate those parts of G that contain obvious 
contradictions or eventualities which are not realized. After that we are in a 
position to deliver a verdict: d is satisfiable iff the resulting tableau G' is not 
empty, i.e., iff the root of G has not been eliminated. 

Let us first concentrate on phase 1. The initial tableau G^ associated with r? 
is defined as ({g’’}, where ^’’= 0 and l{g^) = S^}. To define the tableau 

rules, we require a number of auxiliary notions. Let S' be a constraint system and 
X a term occurring in S. Denote by Aa,(S) the set {C\{x : Q)C) € S} and define 
an equivalence relation on the set of variables (not terms) in S by taking 
V u iff A„(S) = Au{S). The equivalence class generated by v is denoted by 
[u]s. Finally, let denote the set of all equivalence classes [wjs. 

Similar to the local blocking strategy on variables of constraint systems, we 
need a global blocking strategy on the nodes of tableaux. To define this kind of 
blocking, it is convenient to abstract from variable names. 

Let S and S' be constraint systems. S' is called a variant of S if there 
is a bijective function tt from the variables occurring in S onto the variables 
occurring in S' which respects markedness (i.e., unmarked variables are mapped 
to unmarked variables and marked variables to marked variables) and S' is 
obtained from S by replacing each variable v from S with tt{v). In this case tt is 
called a renaming. 

Like constraint systems, tableaux are equipped with a strict partial order <C 
on the set of nodes which indicates the order in which the nodes of the tableau 
have been introduced. The tableau rules are shown in Fig. 4. Intuitively, the 
=^Q rule generates a new time point, while the other rules infer additional 
knowledge about an already existing time point. For every saturation rule — >s 
we have a corresponding tableau rule ^=>s- The and rules deal with 

constant domains and use the notion of ancestor which is defined as follows. 

Let G = {G, <,l) be a tableau for d. A node g £ G is called a state if only 
the =>o applicable to g. The node g is an ancestor of a node g' £ G ii 

there is a sequence of nodes g^, . . . ,gn such that go = g, gn = g^ Pi -< gi+i for 
i < n, and go is the only state in the sequence. 

As to the =^o rule, recall that variables represent types rather than ele- 
ments. In view of this, when constructing the next time point, we ‘merge’ vari- 
ables satisfying the same concepts (by using the equivalence classes). Actually, 
this idea is crucial for devising a terminating tableau algorithm despite the lack 
of the finite domain property. The rule formalizes the choice of a prede- 
cessor type as was sketched in the example above. Since we have to choose a 
predecessor type, the rule behaves similar to a disjunctive saturation rule, which 
means that we must apply the rule in a different way for marked variables. 
That is why we need the ^=> 4 ,' rule: for marked variables, it considers arbitrary 
combinations of choices of predecessor types. 
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=^s {G',^',n 

if p is a leaf in G, the saturation rule — >s is applicable to l{g), 

Si, . . . ,Sn are the possible outcomes of the application of — >s to l{g), 

G" = G a {gi, . . .,g„} and, for 1 < i < n, ^ U {{g,gi)} and l'{gi) = Si 

{G,^,l) {G',^',l') 

if G' = G a {g'}, -<' = {{g,g')} for some leaf g € G, 

l'(g') is the union of the following sets: 

{a : T} U {a : G I (a : QG) € 1(g)}, for a € ob(l(g)), 

1(g)}, 

{min([u]i(g)) : T} U {min([u]i(g)) : G | (min(M,(g)) : QG) € 1(g)}, 

for Mi(9) e 

{«' : T}, 

where v' is the only marked variable in l(g'), 

and there is no g" £ G with g" <C g such that l(g”) is a variant of 1(g) 

(i.e., the rule is not blocked) 

(G,<,1) (G',<’,1') 

if g is a leaf in G, v is an unmarked variable in 1(g), g' is the ancestor of g, 
for no term x in l(g') do we have 

{C\(x: QG) £ l(g')} c { G I (u : G) e 1(g)}, 
vi, . . . ,v„ are the marked variables in l(g'), G' = G W {gi, . . . , <?„}, and, 
for 1 < i < n, we have ^ U {(p, gi)} and 

l'(gi) ~l(g)U{v:G\(vi:OC)£l(g')}. 

(G,^,l) (G',^',n 

if g is a leaf in G, w is a marked variable in 1(g), g' is the ancestor of g, 
for no term x in l(g') do we have 

{C\(x: QG) £ l(g')} G{G\(v:G)£ 1(g)}, 

X = {min([w'];(g/)) | w' is a marked variable in l(g')}, 

Yi is the fth subset of X (for some ordering), 

G' = G tt) {gi, . . . , g 2 ix\}, and, for 1 < i < 2^^^, we have ^ U {(p, gt)} and 
l'(gi) is the union of 1(g) and the following sets, where we assume Yi = {ui, . . . , u„}: 

{u : G I (ui : OC) e 1(g)} 
copy(l(g),v,v'j) for 1 < j < n 
Wj ■ C I (vj : QG) £ l(g')} for 1 < j < n 
Here, all newly introduced variables v} are marked in l'(gi). 

Note: For all rules, we assume that l'(g) = 1(g) for aX\ g £ G. A\g B denotes the 
disjoint union of A and B. 



Fig. 4. Tableau rules. 
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The tableau rules are applied until no further rule application is possible. To 
ensure termination, we must follow a certain strategy of rule applications. 

Definition 6. A tableau is complete if no tableau rule is applicable to it. Let 
t/o, . . . , be a sequence of tableaux such that the associated orders <Co, ■ • ■ , 
describe the order of node introduction and, for every i < n, there is a tableau 
rule =4>, such that Gi Gi+i and 

— if the rule is one of the generating rules or ^=^> 3 , then no tableau rule 

different from =>a, and =^q is applicable to Gi, 

— if the rule is =^o> then no other tableau rule is applicable to Gi- 

Then Go, ■■■ ,Gn is said to be built according to the tableau strategy. If this is the 
case. Go = G-d, and Gn is complete, then Gn is called a completion of 

The following lemma claims that the tableau strategy ensures termination. 

Theorem 2. If the tableau rules are applied according to the tableau strategy, 
then a completion is reached after finitely many steps. 

Let us now turn to the second phase of the algorithm, i.e., to the elimination 
phase. We begin by defining which nodes are blocked. 

Definition 7. Let G = (G, 1) be a tableau for D. A state 5 G G is blocked by 

a state g' G G if 5 ' <C 5 and l{g') is a variant of l{g). We define a new relation 
^ by taking g<g' if either g g', or g has a successor g" that is blocked by g' . 

An important part of the elimination process deals with so-called eventualities. 
An Lc-formula a G A is called an eventuality for a constraint system S' if a 
is either of the form x : CUD or of the form tpUif. An eventuality is said to 
be unmarked if it is not of the form v : CUD for any marked variable v. All 
eventualities occurring in the tableau have to be ‘realized’ in the following sense. 

Definition 8. Let G = (G,^,l) be a tableau for d, g G C, and let a be an 
eventuality for 1(g). Then a is realized for g in t/ if there is a sequence of unblocked 
nodes (/o ^ 5 n in G with g = go, n > 0, such that the following holds: 

(1) if a is (fiUtp then ip G ?((?„); 

(2) if a is u : CUD, with v unmarked or marked variable, then there are variables 
Vi from l(gi), i < n, with vq = v, vi, . . . ,Vn unmarked, (u„ : D) G l(gn), and, for 
alH, 0 < t < n, we have 

- if gi-i is a state, then {G | (v^-i : QC) G ?(g*-i)} C {C \ (vi : G) G l(gi)}, 

- if g^-l is not a state, then {G | (ui_i : G) G l(g^-l)} C {C \ (vi : C) G l(gi)}', 

(3) if a is a : CUD, for some object name a, then (a : D) G l(gn). 

Intuitively, the variables vo, ... ,Vn in (2) describe the same element at different 
moments of time. It should be clear that in a tableau representing a quasimodel, 
all eventualities have to be realized. Apart from removing nodes that contain 
clashes, to remove nodes with non-realized eventualities is the main aim of the 
elimination phase. 
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Definition 9. Let Q = (G, -<, 1) be a tableau for i?. We use the following rules 
to eliminate points in Q: 

(ei) if l{g) contains a clash, eliminate g and all its ^*-successors 
(where ‘^*-successor’ is the transitive closure of ‘^-successor’); 

(62) if all ^-successors of g have been eliminated, eliminate g as well; 

(es) if l{g) contains an unmarked eventuality not realized for g, eliminate g and 
all its ^*-successorsd 

The elimination procedure is as follows. Say that a tableau Gi = (Gi, ^1, ^i) is a 
subtableau of G2 = (G2, ^2, ^2) if G2 3 Gi and Gi is the restriction of G2 to Gi. 
Obviously, if G2 is a tableau for and Gi contains the root of G2, then Gi is a 
tableau for Suppose now that G = (G, 1) is a completion of d. We construct 

a decreasing sequence of subtableaux G = Go, Gi, ■■ ■ by iteratively eliminating 
nodes from G according to rules (ei)-(e3), with (ei) being used only at the 
first step. (The two other rules are used in turns.) Since we start with a finite 
tableau, this process stops after finitely many steps, i.e., we reach a subtableau 
G' = (G', I') of G to which none of the elimination rules can be applied. We 

say that the root of G is not eliminated iff G' yf 0. 

Theorem 3. A PI' Lj\^cc -formula d is satisfiable iff there is a eompletion of t) 
of which the root is not eliminated. 

As a consequence of Theorems 2 and 3 we obtain 

Theorem 4. There is an effective tableau procedure which, given a PTL^cc- 
formula d, decides whether d is satisfiable. 



6 Conclusion 

This paper — a continuation of the series [14,4,16,11] — develops a tableau rea- 
soning procedure for the temporal description logic PTL_4£c interpreted in two- 
dimensional models with constant ACC domains. As shown in [12], the algorithm 
runs in double exponential time — thus paralleling the complexity of Wolper’s 
original PTL-algorithm [17] which solves a PSPACE-complete problem using ex- 
ponential time. Despite the high complexity, we believe that the devised tableau 
algorithm is an important first step towards the use of TDLs as KR&R tools. 
A prototype implementation of the described algorithm is currently underway. 
Based on the experiences with this implementation, possible optimization starte- 
gies will be investigated using the work in [9] as a starting point. 

An important feature of the developed algorithm is that the DL component 
can be made considerably more expressive, provided that the extension is also 
supported by a reasonable tableau procedure. One idea we are working on now 

^ Of course, eventualities which are marked also have to be realized. However, the 
fact that all unmarked eventualities in a tableau are realized implies that all other 
eventualities are also realized (see proofs). 
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is to extend this component to expressive fragments of first-order logic, thereby 
obtaining tableau procedures for fragments of first-order temporal logic (cf. [8]) 
having potential applications in a growing number of fields such as specifica- 
tion and verification of reactive systems, model-checking, query languages for 
temporal databases, etc. 

Another interesting aspect of this paper is that, with minor modifications, the 
constructed tableaux can be used as a satisfiability checking procedure for the 
Cartesian product of S5 and PTL (cf. [13]), thus contributing to a new exciting 
field in modal logic studying the behavior of multi-dimensonal modal systems [7]. 
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Abstract. This paper presents a sound and complete free- variable tab- 
leau calculus for constant-domain quantified modal logics, with a propo- 
sitional analytical basis, i.e. one of the systems K, D, T, K4, S4. The cal- 
culus is obtained by addition of the classical free-variable 7 -rule and the 
“liberalized” (5^-rule [14] to a standard set of propositional rules. Thus, 
the proposed system characterizes proof-theoretically the constant-do- 
main semantics, which cannot be captured by “standard” (non-prehxed, 
non-annotated) ground tableau calculi. The calculi are extended so as to 
deal also with non-rigid designation, by means of a simple numerical an- 
notation on functional symbols, conveying some semantical information 
about the worlds where they are meant to be interpreted. 



1 Introduction 

Quantified modal logic (QML) can be given a model-theoretical characterization 
by extending the propositional Kripke semantics: a first-order modal structure 
is a set of first-order classical interpretations (the “possible worlds”), connected 
by a binary relation (the accessibility relation). However, things are not just 
so simple, and several issues have to be addressed (see for example [12] for 
an overview). Among them, possible restrictions on the designation of terms 
and the object domains associated to the possible worlds distinguish different 
“variants” of QML. When the interpretation of a ground term is required to 
be the same in every world, then it is said to be a rigid term, otherwise it is 
non-rigid. Rigid and non-rigid designation can in principle coexists within the 
same logic, whenever some symbols are given a rigid interpretation and others 
are not. On the contrary, requirements about possible relations between the 
universes of different worlds necessarily characterize different logics, the same 
way as restrictions on the accessibility relation do, on the propositional side. The 
most commonly considered variants of QML, in this respect, are the constant- 
domain variant, where the object domain is the same for all worlds, and the 
cumulative-domain (or increasing- domain) variant, where the object domains 
can vary, but monotonically, i.e. if w' is accessible from w, then the object 
domain of w is included in the domain of w'] when the domains of different 
worlds are independent one of the other, then domains are said to be varying. 
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Cumulative-domain QML, with rigid designation only, can easily be given 
proof theoretic characterizations, obtained by addition of the principles of clas- 
sical first-order logic to a propositional modal system. In fact, in such systems 
the converse of the Barcan Formula, that characterizes cumulative domains, is 
provable: U\/xA — >■ MxUA. Rigid designation, on the other hand, is a conse- 
quence of the instantiation rule of classical logic. Quantified modal logics with 
cumulative domains and rigid designation have been given sequent and tableau 
calculi [9,10,15], natural deduction proof systems [3], matrix proof procedures 
[21], resolution style calculi [1,6], and have been treated by means of translation 
methods [2,19]. 

Constant-domain logics with rigid designation can be treated axiomatically, 
by addition of the Barcan formula, MxUA — >■ U\/xA, to the axioms and rules of 
classical first-order logic and modal propositional logics. Beyond the axiomatic 
approach, translation methods are general enough to treat constant-domain log- 
ics, and rigid as well as non-rigid designation (see, for instance, [2,19]). Constant- 
domain logics with rigid designation have been formalized in the tableau style, 
but with the addition of prefixes labelling tableau nodes [9], as well as by means 
of matrix proof methods [21]: in both kinds of calculi in fact it is possible to 
analyse more than one possible world at a time, and this allows the proof to “go 
back and forth” (the same mechanism solves a similar problem raised by sym- 
metric logics). In [9] all the variants of QML concerning the object domains of 
possible worlds are treated by prefixed tableau methods. A different approach is 
represented by modal display calculi, where the addition of the classical sequent 
rules for quantifiers captures constant-domain logics [4], and constant, increas- 
ing and decreasing-domain modal logics can all be presented as cut-free display 
sequent calculi, by use of structure-dependent rules for quantifiers [22]. A fur- 
ther direct approach dealing with both varying domains and non-rigid symbols 
has a representative in [16], which defines a resolution method for epistemic log- 
ics, where terms can be annotated by a “bullet” constructor distinguishing rigid 
terms from non-rigid ones. A rather different direction is followed in [11], where 
the language of modal logic is enriched by means of a predicate abstraction op- 
erator, in order to capture differences on the denotation of terms, and a tableau 
proof procedure is presented for such a logic, with no restriction on the domains 
of possible worlds. 

The constant-domain requirement bears some relationship with the symme- 
try of the accessibility relation: if a model is symmetric and its domains are 
cumulative, then it is a constant-domain model. Constant-domain and symmet- 
ric logics also share a proof-theoretical difficulty: it is provable that constant- 
domain logics cannot be captured by “standard” tableau methods, or cut-free 
sequent calculi. In fact, the Craig Interpolation Theorem does not hold for such 
logics [8], while it is a consequence, in some cases, of the existence of sound and 
complete cut-free Gentzen systems [9]: 



It follows that there can be no “reasonable” cut-free tableau system for 
such logics for if there were we could use it, or the related symmetric 
Gentzen system, to prove an Interpolation result ([9], p.383). 
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This paper shows that, however, constant-domain logics - with a propositional 
analytical basis, i.e. one of the systems K, D, T, K4, S4, whose Gentzen calculi 
enjoy the cut-elimination property ~ can be given a simple proof-theoretical 
characterization by means of free-variable tableau systems (obviously, the proof 
of the Interpolation Theorem given in [9] for cumulative-domain QML does not 
extend to such systems. See also Section 5). Moreover, non-rigid designation 
can also be treated, by means of a simple numerical annotation on functional 
symbols, conveying some semantical information about the worlds where they 
are meant to be interpreted. A similar annotation mechanism is used in the 
tableau calculi for cumulative and varying domain QML, with either rigid or 
non-rigid designation, presented in [7]. 

The paper is organized as follows: Section 2 gives an informal presentation of 
the free- variable calculus for constant-domain logics with rigid designation only, 
and the main intuitions justifying its appropriateness. The syntax and formal 
semantics of constant-domain logics with both rigid and non-rigid designation 
are presented in Section 3, and the tableau systems for such logics in Section 4, 
where the main lines of the soundness and completeness proofs are also given. 
Section 5 concludes this work. 



2 The Role of the 5-Rule in Modal Calculi 



Ground tableaux systems for QML with increasing domains and rigid designation 
are easily obtained by addition of the classical quantifier rules to the modal 
systems [9]: 

yxA,S 3xA,S 

A[t/x],\fxA,S ^ A[c/x],S 

In the 7G-rule, t is any ground term occurring in the branch. In the lic-rule, c 
is a new constant, that does not occur in {3a;A} U S. 

The ground approach cannot be adapted to the constant-domain case. A 
naive and intuitive account of the difficulties that are encountered can be given 
as follows. In order to be complete, the expansions of a universal formula VxA 
should include the instances A\cjx\ where c is a constant denoting an object 
known to exist in a further accessible world. For instance, if we attempt to prove 
the Barcan formula (with the 7r-rule for system K): 



n(VxDA - 

Va;nA, 



> DVxA) 



(a) 



UA[a/x], VxDA, -iDVa;A 



A[a/x], -tixA 
A[a/x], -^A[c/x] ^ 



(7g) 

{t^k) 



we observe that, in order to obtain a closed tableau, we need to get a leaf 
containing also A[c/x]. But, after the application of the 7r-rule, the universal 
formula is not available any more. So, in principle, the formula VxDA should be 
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expanded to UA[c/x], before the application of the 7 r-rule. This is reasonable, 
since the object denoted by c, that will further on be known to exist, belongs to 
the domain of the presently considered world, too. But, if we change the tableau, 
replacing every occurrence of the constant a with c, then the constant c cannot 
be used any more in the application of the ( 5 ( 3 -rule, since it is not “new” to the 
node. 

Free- variable tableau calculi for QML with increasing domains and rigid des- 
ignation can be obtained in the same straightforward manner as the ground 
calculi are, by addition of the classical free-variable rules, and the substitution 
rule, to a modal propositional system [10]. The quantifier rules are: 

'ixA,S 3xA,S 

A[v/x],yxA,S A[f{vi,...,Vk)lx],S 

In the 7 -rule, V is a new free variable (also called parameter). In the <5-rule, / 
is a new functional symbol, and v\,...,Vk are all the parameters occurring in 
{3xA} U S'. 

It has been proved [14] that the classical calculus stays sound if, in the <5-rule, 
only the parameters actually occurring in 3xA are required to be the arguments 
of the Skolem function. Such a “liberalized” rule is called the <5+-rule. Although 
in the classical case the (5 "'"-rule is a more efficient - but equivalent - reformulation 
of the d-rule, it is not sound with respect to the cumulative-domain variant of 
QML. In fact, the following tableau shows that the <5 "'"-rule makes the Barcan 
formula provable in a modal calculus (with the 7 r-rule for system K): 



n(Va;nA □Va;A) 
Va;nA, -.□Va;A 



(a) 



UA[v/x], VxDA, -iDVxA 



A[v/x\, AixA 
A[v/x], -^A[c/x] ^ ’ 



(7) 

{t^k) 



The tableau above is closed by application of the substitution {c/v}. 

In this paper we show that the free-variable modal tableau calculi obtained 
by replacing the <5-rule with the 5"'"-rule are sound and complete with respect 
to the constant-domain variants of QML, on a propositional basis that is one of 
the analytical systems K, D, T, K4 or S4. Note, however, that this fact does 
not imply that Skolemization preserves satisfiability in constant-domain QML, 
because quantifiers cannot always cross the modal operators. In fact, VxOA — >■ 
<>\/xA is not valid, even in constant-domain QML. As a consequence, although 
run-time Skolemization is allowed, formulae cannot be initially skolemized. 

Intuitively, the shift from cumulative to constant domains caused by the 
“liberalization” on the arguments of Skolem functions corresponding to the 
rule is due to the following reasons. The key remark is that the role of the 
parameters vi, ...,Vk in the new Skolem term /(ui, ...,Ufe), introduced by the S- 
rule in free-variable calculi, is to prevent the unification of such a term with 
any of vi,...,Vk- Thus, the effect of reducing the set of parameters in Skolem 
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terms is to make more unifications possible, in particular also the unification 
of a parameter v with a Skolem term that still has to be introduced when v is 
generated. 

Clearly, in order to preserve soundness, the unification of a parameter v with a 
term f{vi , ..., Ufc) must be forbidden when the existential quantifier “generating” 
f{vi,...,Vk) is in the scope of the universal quantifier generating v, otherwise 
the dependence V-3 is lost. As a consequence, in that case v must be one of 
vi, ..., Vk, so that V and /(ui, ..., Vk) are not unifiable. With this exception, v and 
f{vi, ..., Vk) can be unified: if the two terms are introduced “in the same world”, 
the reason is the same as in the classical case. Also, if v ranges on the domain 
of a previous world w, but the corresponding quantifier does not dominate the 
existential quantifier corresponding to the Skolem term, then it can be unified 
with such a term, because, in the constant-domain variant of QML, the object 
existing in any world accessible from w belongs to the domain of w too. 

On the contrary, in the free-variable calculus for cumulative-domain logics 
the parameters occurring in a Skolem term /(ui, ...,Vk) must include all the pa- 
rameters introduced in “previous” worlds, because a universal quantifier varying 
on the domain of a world w must not be instantiated with a term denoting an 
object possibly belonging only to the domain of another world (accessible from 
w). Cumulative-domain QML can be given a free- variable tableau calculus with 
the liberalized i5“'"-rule, but only with the addition of numerical annotations on 
functional symbols, in the style of [7]. In fact, in that case, the unification of 
a parameter related to a world w with a term that is not guaranteed to be- 
long to the domain of w is prevented by a suitable restriction on the notion of 
substitution, that takes into account symbol annotations. 

The tableau system for constant-domain QML considered above is a special 
case of the calculus presented in detail in Section 4, where both rigid and non- 
rigid designation are allowed. 

3 Constant-Domain QML with Rigid and Non-rigid 
Terms 

A first order modal language L is constituted by logical symbols (propositional 
connectives, quantifiers, modal operators and a countable set X of individual 
variables), a non empty set Lp of predicate symbols, and a set Lp oi functional 
symbols with an associated arity. The set Lp is partitioned into a set Lp^ of rigid 
functional symbols, and a disjoint set Lp^^^ of non-rigid functional symbols. The 
set Lp is the union of Lp^ and Lp^^. Constants are considered as functional 
symbols with null arity. Terms are built by use of symbols from L p and X in the 
usual way. We consider modal formulae in negation normal form, i.e. built out 
of literals (atoms and negated atoms) by use of A, V, □, O and the quantifiers 
V and 3. Negation over non-atomic formulae and implication are considered as 
defined symbols. 

In the case of constant domain QML, a first-order modal interpretation Ai 
of a language L is a tuple (W, wq, R, D, (j), tt) such that: 
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— W is a, non empty set (the set of “possible worlds”); 

~ tuo is a distinguished element of W (the “initial world” );^ 

— i? is a binary relation on W (the accessibility relation)-, wRw' abbreviates 
{w, w') G R-, 

— D is a non empty set (the object domain); 

— (j) represents the interpretation of constants and functional symbols in the 
language: for every world w G W and fc-ary functional symbol f G Lp (with 
fc > 0), 

<t>{w,f)GD^ 

Moreover, if / G Lp^, then for all w, w' G W, (f){w, /) = , /). 

— 7T is the interpretation of predicate symbols: if p is a k-ary predicate symbol 
and w G W, then Tr{w,p) C is a set of fc-tuples of elements in D. 

The interpretation function (j) is extended to terms in the usual way, and, by 
an abuse of notation, t) denotes the interpretation of t in w. 

li M = {W,wo,R,D,(j),TT) is an interpretation of the language L, the lan- 
guage of the model A4, L{D), is obtained from L by addition of a “name” for 

each d G D, i.e. a new constant d. It is assumed that for every d G D and 
w G W, (j){w, d) = d. Note that the interpretation of names d is always rigid, i.e. 
d G L{D)p^ 

The relation ^ between an interpretation Ai = {W,wq,R,D,4>,tt), a world 
w G W and a closed formula in L{D) is defined inductively as follows: 

1. M,w \= p{ti, ...,t„) iff {(j){w,ti),...,4>{w,tn)) G tt{w,p). 

2. Ai, w ^ —'A iff M, w A. 

3. Ai,w \= A /\ B iA Ai,w \= A and Ai,w \= B. 

4. Ai,w \= Ay B id Ai,w \= A or Ai,w \= B. 

5. Ai,w\= \/xA iff for all d G D, Ai,w \= A[d/x] 

6. Ai,w\= 3xA iff there exists d G D such that Ai,w \= ^[d/a;] 

7. Ai,w \= DA iff for all w' GW such that wRw' , Ai,w' \= A 

8. Ai,w\= OA iff there is a w' G IT such that wRw' and Ai, w' \= A 

A closed formula A is true in AI iff AI , icq \= A, and it is valid iff it is true in 
all interpretations. 

The accessibility relation i? of a modal structure can be required to satisfy 
additional properties, characterizing different logics: we consider seriality (D), 
reffexivity (T), transitivity (K4), both reffexivity and transitivity (S4). When 
no additional assumption on R is made, the logic is K. 

4 The Free-Variable Tableau System 

The language of any tableau for a set of formulae in the language L extends L 
with a denumerable set of parameters (or free variables) and a denumerable set 

^ The semantics with a distinguished initial world dates back to [17]. Obviously, the 
notion of validity (truth in every model) coincides with the semantics where no initial 
world is singled out. 
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of Skolem functional symbols. Moreover it contains annotated symbols /", for 
every / € and n S IN. 

Definition 1. Let L he a modal language, V = {vi,V 2 , ...} an infinite denumer- 
able set of new symbols, called free variables or parameters, and a denu- 
merable set of new function symbols such that, for each k £ JN, contains 
infinitely many function symbols of arity k. Then the labelled free-variable ex- 
tension of L is the language L* such that: 

L*p = Lp 

L*p^ = Lfu U FS'^ U {/"I / G Lp,^,n G IN} 

^Fnr — ^Fnr 

Terms in L* are built up by use of L*p, X and V in the usual way. 

Note that, in the definition above, Skolem functions and annotated functional 
symbols are considered as rigid symbols. 

Definition 2. 

1. A symbol occurrence in a formula or set of formulae is called a non-modal 
occurrence if it is in the scope of no modal operators. 

2. An n-annotated modal formula is a modal formula where non-modal occur- 
rences of non-rigid functional symbols are all annotated with n. A formula 
is annotated if it is n-annotated for some n. 

3. A term t is completely annotated iff every non-rigid functional symbol in t 
is annotated. 

4-. A modal substitution a is a function from the set of free-variables V to com- 
pletely annotated terms in L* , that is the identity almost everywhere. Substi- 
tutions are denoted as usual by expressions of the form {ti/vi, ...,tm/vm}- 
Unifiers, i.e. solutions of unification problems, and most general unifiers 
(m.g.u.) are defined as in the classical case. 

5. If A is a modal formula and n G IN, then A" is obtained from A by annotating 
each non-modal occurrence of a non-rigid functional symbol with n. If S is 
a set of modal formulae, then S'" = {A" | A G S|. 

For instance, if f £ Lp^, c,g £ Lprir, and A = 'i x{p{f {x , c) , g{x)) A 
°<?(5(c))), then A" = Vx(p(/(x,c"),6f"(a:)) A □g(g(c))). 

6. If S is a set of modal formulae and n G IN, then the single node S" is an 
initial tableau for S. 

The main intuition behind the annotation of non-rigid symbols is that, since 
a symbol in Lfjvk occur with different annotations in a tableau branch, 
its annotations distinguish the designations of such a symbol in the different 
worlds corresponding to the tableau nodes. Non-modal occurrences of non-rigid 
symbols are initialized with a given annotation n (for instance 0), that identifies 
the initial world of the searched model. Symbols of Tfjvk occurring in the scope 
of a modal operator receive their annotations only when, by application of a 
modal rule, they come to the surface (by means of operation 5 in Definition 2). 
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1 Classical propositional rules j 


, , aab,s 

A,B,S 


A,S B,S 


1 j/-rules 1 


, , as, S' 

where m G IN is new in the whole 
tableau 


A", OA, S 

where n G IN either occurs as an 
annotation of some symbol in the 
premisse or it is new in the whole 
tableau 


1 TT-rules 1 


, OA,aS,S' 

gm 

where m G IN is new in the whole 
tableau 


0 A, as, S' 

where m G IN is new in the whole 
tableau 



Fig. 1. Propositional expansion rules 



We consider a simple set of propositional tableau rules (others may be chosen 
as well), shown in Figure 1, where S, S' are sets of modal formulae (in L*), DS” 
stands for | A G S'}, S' is a set of non-boxed modal formulae, comma is 

set union. The propositional part of the tableau systems we consider consists of 
the classical rules, and: the rule in the systems K, D and T, the rule 7T4 in 
K4 and S4, the rule i^t in T and S4, the rule i/d in D. The quantifier expansion 
rules are: 



Quantifier rules 



(7) 



WxA,S 

A[v/x],'ixA, 



S 



{5+) 



3xA,S 

A[f{vi, ...,Vk)/x],S 



where u is a new parameter 



where / is a new Skolem function, 
i.e. a symbol in that does not 
occur elsewhere in the tableau, and 
vi,...,Vk are all the parameters 
occurring in 3xA 



Adopting a terminology from [13], the rules vu, and 7T4 are called dy- 
namic, the others are called static. So, a non-annotated non-rigid symbol can get 
an annotation only with the application of dynamic rules. Note, moreover, that 
only non-modal occurrences of non-rigid functional symbols in a tableau node 
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can be annotated. As a consequence, every tableau node is a set of n-annotated 
formulae, for some n G IN. In fact, the expansions obtained by application of a 
dynamic rule never contain symbols with different annotations (i.e. if both /” 
and occur in the expansion, then n = k). Such a property is preserved by 
applications of the i^T-rule. Therefore, no tableau node contains symbols with 
different annotations. 

The substitution rule we adopt here requires the following preliminary defi- 
nitions: 

— If P and Q are two atomic formulae in L*, then P = Q if P and Q are 
identical, including their annotations. 

— Let Si,...,Sk be all the leaves of a free- variable tableau T, and, for each 
i = 1, ...,fc, let Pi and -•Qi be literals in Si. If the modal substitution cr is 
a solution of the unification problem P\ = Qi,...,Pk = Qk, then a is an 
atomic closure substitution for T. If it is a most general solution for such a 
unification problem, then it is a most general atomic closure substitution for 

r. 

The substitution rule is then the following: 



Most General Atomic Closure Substitution Rule 

If T is a tableau for a set S of sentences in L and the modal substitution 
(j is a most general atomic closure substitution for T, then the tree Ter, 
obtained by applying u to T, is a tableau for S. 



A tableau is closed iff each of its leaves contains a pair of complementary 
literals, i.e. literals P and ~'P. A closed tableau for a formula -•A is a tableau 
proof of A, and a closed tableau for a set of formulae S' is a refutation of S. 



As a first example, consider the language L with the unary predicates p and q, 
the rigid constant c and the non-rigid unary functional symbol /. The following 
tableau shows that {Op{f(c)),'ixO(p(x) — >■ g(a;)), -■□(/(/(c))} is refutable in K, 
since it can be closed by the substitution {/°(c)/u}: 



^p{f{c)),yxD{p{x) -)> q{x)),^aq{f{c)) 

^P{f{c)),0{p{v) q{v))yxa{p{x) -» q{x)),^aq{f{c)) 

P{f{c)),p{v) q{v),^q{f{c)) 

p(/°(c)), -p(t’), -<?(/°(c)) p(/°(c)), -<?(/°(c)) 



(7) 

(TTic) 



And now a tableau that should not and indeed does not close, thanks to the 
different annotations of / in different worlds: 

P(/°(c)), yx{p{x) -)> □g(x)), -.□g(/(c)) 



p(/°(c)), p{v) □?(«), Vx(p(x) □< 7 (a;)), -.□^(/(c)) 



(7) 



P{f{c)), ~^p{v), 
\/x{p{x) n(7(a;)). 



P(/°(c)), n<?(u), 

yx{p{x) -)> □g(x)), -.□g(/(c)) 



(/ 3 ) 
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The tableau above is in fact a failed attempt to give a refutation of the satisfiable 
set S = {p{f{c))yx{p{x) — >■ Oq{x)),-'Oq{f{c))}, where c is rigid and / is non 
rigid, in any of the systems K, D or T. The tableau cannot be closed, since the 
unification problem p(/°(c)) = p{v), q{v) = q{f^{c)) has no solution (note that 
S is satisfiable because / is non-rigid). And no matter how many applications 
of the 7 -rule are added to the tableau above, the set S cannot be refuted. 

Note that, in the absence of non-rigid symbols, i.e. when Tfatk = the 
calculus presented in this sections results exactly from the addition of the clas- 
sical free-variable 7 and 5+-rules to a standard tableau calculus for any of the 
considered propositional modal logics. 

4.1 Soundness and Completeness of the Free- Variable Calculus 

In this section we give a sketchy account of the soundness and completeness 
proofs. The proofs make use of a modal Substitution Theorem, stating that, if 
M = (IT, Wo, R, D, (j), 7T, ) is an interpretation of a modal language L, w £ W, A 
is a formula with only the variable x free, and t, t' are ground terms of L, then: 

1. if the interpretations of t and t' are rigid and equal, i.e. for all w' € W, 
4>{w' ,t) = (j){w' ,t'), then M,w \= A[t/x] iff M,w \= A[t' /x\. 

2. If X does not occur in A in the scope of any modal operator and = 

4>{w,t'), then M,w \= A[t/x\ iff M,w \= A[t' /x\. 

Theorem 1 (Soundness). If there is a closed tableau for S, then S is unsat- 
isfiable. 

The soundness proof runs along standard lines, except for the fact that the 
following notion of tableau satisfiability must be considered: 

Definition 3. Let T be a tableau, with parameters among v\,...,Vk, and let 
A4 = {W,wo,R,D,<f,TT) be a modal interpretation. Then M. \= T iff for all 
d\,...,dk G D there is a leaf S of T and a world w £W such that 



M,w 1= S{di/vi,...,dk/vk} 

T is satisfiable iff M \= T for some interpretation Ai. 

Soundness follows from the fact that, if S' is a satisfiable set of formulae, then 
any tableau for S is satisfiable. The proof of this fact is an induction on tableaux. 
We just note here that it is the case of the substitution rule that makes essential 
use of the hypothesis that the object domain is the same for each world. 

Theorem 2 (Completeness). If S is an unsatisfiable set of modal formulae, 
then there exists a closed tableau for S. 

The completeness proof also follows the standard approach consisting of the 
construction of a canonical model of a set of formulae having no closed tableau. 
In general, we deal with possibly infinite sets of formulae, and make use of the 
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following notion of tab- consistency: S is tab-consistent iff for every finite subset 
S' C S there is no closed tableau for S'. 

The reformulation of the basic notion of downward saturated sets of formu- 
lae is the following: a (finite or infinite) set S of n-annotated modal formulae, 
possibly containing parameters, is n-downward saturated iff 

1. S does not contain any pair of literals Pi and -'P 2 such that Pi and P 2 are 
unifiable; 

2. ii A A B G S then A G S and B G S 

3. ii Aw B G S then either A G S or B G S; 

4. if € S then A[v/x] G S for every parameter v gV; 

5. if 3xA G S, and vi,...,Vk are all the parameters occurring in A, then for 
some Skolem functional symbol /, A[f{vi, ...,Vk)/x] G S; 

6. if the logic is either T or S4 and OA G S then A" G S. 

It can easily be proved that if a set S of n-annotated modal formulae is 
tab-consistent, then: 

1. If A A i? G S', then S U {A, B} is tab-consistent. 

2. If Gl V i? G S, then either S U {A} or S U {B} is tab-consistent. 

3. If WxA G S then 

SU y {A[v/x\} 
vev 

is tab-consistent. 

4. If BxA G S, / is a functional symbol that does not occur in S and vi , ..., Vn 
are all the parameters occurring in A, then S U {A[f{vi, ...,Vn)/x\} is tab- 
consistent. 

5. If the logic is either T or S4 and OA G S, then S U {A"} is tab-consistent. 

The result stated above is used to prove the following: 

Lemma 1. Let L he a modal language, n G IN and S a (finite or infinite) set of 
n-annotated modal formulae in L* . If PI is a set containing an infinite number of 
functional symbols (for each arity) that do not occur in S and S is tab- consistent, 
then there exists a tab-consistent set S°° A S of n-annotated formulae (possibly 
containing also symbols from H ), such that S°° is n-downward saturated. 

The main idea behind the proof of the lemma above is standard and consists 
of the construction of a (pseudo-)tableau T“, rooted at S, in such a way that 
T°° is closed with respect to the application of static expansion rules. Such a 
construction uses Skolem functions in H in the applications of the i5“'"-rule. The 
tree T°° is not properly a tableau, since its nodes may be infinite sets of formulae. 
In presence of such infinite objects, some care has to be taken in defining a “fair” 
rule application strategy. 

The canonical model of a tab-consistent set of formulae S in L* is built 
along standard lines, using Lemma 1. The initial world wq is a tab-consistent 
and 0-downward saturated superset of S^, and each world w in the model is a 
tab-consistent and n-downward saturated set of n-annotated formulae, for some 
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n G IN that is uniquely associated to w (and is called its “name”): name{w) = n. 
To each world w is also associated a set of Skolem functions H{w) to be used in 
the application of Lemma 1, in such a way that M w ^ w' then H(w) and H(w') 
are disjoint. The interpretation of the language is “syntactical”: its domain D is 
the set of completely annotated ground terms in L*, and the interpretation of 
terms is “almost” Herbrand-like: for all w G IT 

- if/G Lp^ U is either a rigid symbol in L or a Skolem function, then for 

all G D: 

— If/G Tfnh is a non-rigid symbol in L, then for all ti, ...,tk G D: 

for all m G IN: ^(u;, ffc) = tfc) 

and, if name{w) = n: 4>{w, f) = 4>{w, /") 

So, the interpretation of annotated symbols is always rigid and, for all ru G IT 
and completely annotated ground term t in L*, 4>{w,t) = t; i.e. ii t G D then 
4>(w, t) = t. In particular, the interpretation of completely annotated terms is 
rigid. 

Finally, let cr be any surjective function from the set of parameters V to the 
set of completely annotated ground terms of the extended language: for each 
completely annotated ground term t there exists v G V such that a{v) = t. li A 
is a formula, then a{A) denotes the formula obtained from A by replacement of 
each parameter in ^ by cr{v). If S' is a set of formulae, cr(S) = {(j{A) \ A G S}. 
The interpretation function tt is then defined as follows: for all rc G IT and fc-ary 
predicate symbols p in L*: 

7 t(w,p) = I p(ti,...,tk) G cr(w)} 

It can be proved that, for all ru G IT and for every formula A: 

if A G rc then M,w \= a{A) 

Hence, in particular, A4,wq ^ cr(wo) and, since S° contains no parameters and 
C Wo, M,wo 1= S°. This easily implies that M,wq H •S'- 

5 Concluding Remarks 

This work shows that the addition of the free- variable 7 -rule and i5“'"-rule, rather 
than the more constraining Trule, to standard propositional modal tableau cal- 
culi captures proof-theoretically the constant domain semantics for QML in the 
case of rigid designation.^ Moreover, non-rigid designation can also be answered 
for, by means of simple annotations on non-rigid functional symbols. Thus, it 
makes it apparent the sensitivity of modal tableau free-variable calculi to alter- 
native formulations of the d-rule, contrarily to the classical case, and similarly 

^ Although complete proofs have been carried out only for logics whose propositional 
basis is among K, D, T, K4, S4, the extension to other cut-free propositional calculi 
should be straightforward. 
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to what happens for other non classical logics. For instance, in [5], where a free- 
variable sequent system which is sound and complete for linear logic is proposed, 
it is shown that an analogous liberalization of the Vjj and the 3^, rules is un- 
sound: in that case, too, restricting the set of arguments of Skolem functions 
results in a shift to a different “logic” . [20] and [18] make a fine analysis of the 
relationship between the arguments of Skolem functions in run-time Skolemiza- 
tion and rule permutability in cut-free sequent calculi for intuitionist and linear 
logic, respectively. 

As already observed, even in the case of rigid designation, “standard” (i.e. 
non-annotated, non-prefixed) ground tableau systems for constant-domain QML 
cannot be given, as a consequence of the failure of Craig’s Interpolation Theorem 
[8,9]. However, the use of unification in the quantifier rules, which is the essential 
feature of free-variable tableaux, makes it possible to give a proof-theoretical 
characterization of constant domains in the Fitting-Gentzen style. 

In Figure 2, some of the inference rules of a sequent calculus for QML which 
is sound and complete with respect to constant domains and rigid designation 
are provided. This calculus is formulated in a completely standard Gentzen style, 
but for the fact that unification and run-time Skolemization are embedded in 
the quantifier rules. A proof oi a sequent S in such a formal system is given by a 
deduction tree T, whose root is S, and a substitution a such that all the leaves of 
Ter are axioms, i.e. sequents of the form S,A\~A^ S'. For the sake of concision, 
we present here only the calculus for system T. 

Such sequent systems can obviously be reformulated as symmetric calculi 
(in the sense of [9], i.e. calculi where formulae never cross the sequent arrow), 
taking negation as a defined connective and eliminating the rules for negation and 
implication. In this case, axioms include also sequents of the form S', A, -lA h S' 
and S h A, -lA, S'. The equivalence of such symmetric calculi and the tableau 
calculi presented in Section 4 is immediate: if S and S' are sets of formulae in 
negation normal form, a sequent S h S' is provable in the symmetric sequent 
calculus if and only if the set of formulae S U {-•A \ A £ S'} has a refutation in 
the (corresponding) tableau system of Section 4. 

We conclude this section showing where an attempt to adapt Fitting’s proof 
of the Interpolation Theorem for “ordinary” cut-free sequent calculi ([9]) to the 
calculus presented in Figure 2 fails. In fact, the rules 3^ and V/j (corresponding 
to the i5“''-rule of the free variable tableau system) do not preserve the existence 
of interpolants. This is not surprising: such rules are not locally sound, since “lib- 
eralized Herbrandization” (the <5 “'"-rule) does not correspond to the eigenvariahle 
condition in sequent systems, like standard Herbrandization (the Trule) does. As 
a simple example, let us consider the following deduction tree in the (symmetric) 
sequent calculus: 



p{v) F p{c) 
p{v) F Vx p{x) ^ 
VxDp(x), Dp)?;) F DVx p{x) 
VxDp(x) F □Vxp(x) 



(°fl) 

(Vl) 
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S\-S',A 

5',-nAhS' 


S,Ah S" 
Sh S',-nA 


S,A\- S' S,B\- S' , , 


ShS',A,B 


S', A V B h S' ^ ^ 


Sh S', AVB ^ ' 


S,'ixA,A[v/x]^ S' 
S,Va:Ah S' ^ 

where u is a new parameter 


S\-S',A[f{vi,...,Vn)/x] 

S h MxA 

where vi,...,Vn are all the 
parameters in A 


S,A[f{vi,...,Vrf)/x] h S' 


S'r S',3xA,A[v/x] 


S,3xAh s' ^ 

where vi, ...,Vn are all the 
parameters in A 


ShS',3xA 

where u is a new parameter 


S,A,DAhS' 


ShS',A , , 


S,DAhS' ^ ’ 


So, as h OS', DA, Si ^ ’ 



Fig. 2. Rules of a Free-Variable Sequent Calculus for T-QML with Constant Domains 
and Rigid Designation 



An application of the substitution {c/v} produces a proof of the valid sequent 
(in T-QML with constant domains) VxDp(a:) h U\/xp{x) (corresponding to the 
Barcan formula), starting with the inference: 

(Vh) 

p{c) h Vx p{x) 

Now, following the lines of Fitting’s proof, we immediately find the trivial inter- 
polant p{c) for the axiom. Such a formula should be an interpolant also for the 
sequent p{c) h \/xp{x), derived via an application of the V^j-rule (corresponding 
to a (5 ■'■-rule in the tableaux systems). But this is clearly false. 
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Abstract. An experimentation activity in automated set-reasoning 
is reported. The methodology adopted is based on an equational re- 
engineering of ZF set theory within the ground formalism C,^ developed 
by Tarski and Givant. On top of a kernel axiomatization of map algebra 
we develop a layered formalization of basic set-theoretical concepts. A 
first-order theorem prover is exploited to obtain automated certification 
and validation of this layered architecture. 

Keywords. Set reasoning, map algebra, first-order theorem proving. 



Introduction 

Any basic mathematical concept can be suitably formulated within axiomatic 
Set Theory, which can hence be regarded as the most promising, as well as 
challenging, arena for automated theorem-provers. 

Sustained efforts have been devoted to experimentation with state-of-the-art 
theorem provers, to get automated proofs of common set-theoretical theorems. 
Different axiomatic systems of set theory have been tried, to determine which 
one offers the best support to resolution-based theorem provers (such as Otter, 
cf. [12,16]). 

In much of the experimentation activities carried out in the past (cf., e.g., [3, 
11]), the von Neumann-Godel-Bernays axiomatization of set theory has been 
preferred, because NGB offers a finite first-order axiomatization. On the other 
hand, to mention just one alternative approach, [9] and [10] resort to higher order 
features of Isabelle to deal with the Zermelo-Fraenkel set theory. The recourse 
to higher order logic turns out to be mandatory, because ZF cannot be finitely 
axiomatized in first-order logic. 

Deepening the same approach proposed in [6], this paper will focus on an 
equational rendering of ZF. Our formulation of the axioms is based on the for- 
malism of [15], which is equational and devoid of variables. A theory stated 
in can easily be emulated through a first-order system, simply by treating 
the met a- variables that occur in the schematic formulation of its axioms (both 
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log(SETA)) and by MURST (PGR-2000 — Automazione del ragionamento in teorie 
insiemistiche). 
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the logical axioms and the ones endowed with a genuinely set-theoretic content) 
as if they were first-order variables. In practice, this means treating ZF as if it 
were an extension of the theory of relation algebras [14]. As an immediate con- 
sequence, we obtain a finite axiomatization (stronger but retaining the traits) of 
ZF ; this can be achieved since variables are not supposed to range over sets but 
over the dyadic relations on the universe of sets. 

By taking the results presented in [6] as a starting point, we report on the 
beginnings of an experimentation activity exploratorily based on the first-order 
theorem-prover Otter. This prover will be exploited to provide an inferential 
apparatus for . In turn, this apparatus will serve as the basis on which an 
inferential machinery for set-reasoning will grow. 



1 A Layered Development of Map-Reasoning 

is a ground equational language where one can state properties of dyadic 
relations — maps — over a domain U of discourse. The basic ingredients of £^are 
three constants 0, U, three dyadic constructs PI, A, o of map intersection, 
map symmetric difference, and map composition, respectively; and the monadic 
construct of map conversion. Then, a map expression is any term of the 
following signature: 



symbol : 


0 


u 


L 


6 


n 


A 


o 


-1 


— 


\ 


U 


t 


degree : 


0 


0 


0 


0 


2 


2 


2 


1 


1 


2 


2 


2 


priority : 










5 


3 


6 


7 




2 


2 


4 



The map whose properties we intend to specify is the membership relation 
G over the class 1/ of all sets. Hence, 6 is the only primitive map letter of . 

The language consists of map equalities Q — R, where Q and R are map 
expressions. A number of derived constructs and shorthands for map equalities 
can be easily introduced (cf. Fig. 1). 

For an interpretation of , one fixes a nonempty subset €'* of =oeM x W- 
Then each map expression P designates a map P® on the basis of the usual 
evaluation rules, e. g.: 

{QoRff =Det { [a, b] gU^ : there are cs in U for which [a, c] G Q® and [c, b] G R^}. 

Accordingly, an equality Q — R turns out to be either true or false in 9. 

The logical axioms characterizing the derivability notion F for (cf. Fig. 1) 
will be supplemented with proper axioms reflecting one’s conception of U as being 
a hierarchy of nested sets over which G behaves as membership. 

It must be said that there is no representation theorem that plays for map 
algebras a role analogous to the Stone theorem for Boolean algebras (cf. [2]). 
In other words, there exist equalities that are true in all algebras of dyadic 
relations over a fixed U but which are false in some structure which, though 
fulfilling the axioms of map algebra, does not consist of relations. This defect 
will presumably propagate to any set theory formulated as an extension of the 
map algebra; but anyway, even in first-order logic, a set theory never reflects the 
intended semantics univocally, and hence the map-algebraic formulation and the 
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PUQ =Def PAQAPnQ 

P =Def P A U P f Q =Def P O Q 
funcPart( P ) =Det P \ P oZ 
lAbs(P) =DetP = aoP 



P\Q =Def P A P n Q 
P ^ Q =Def P C] Q = P 
Func(P) =Def P’^^oP C t 
Total(P) =Def Pol = 1 



Png 

pn (g AP) APng 

( p *1 g ) *1 p 

to p 
p~i-i 

(p *2 g)“" 

((PAg) APng)oP 
P'^o(pn(PogAi))ng 
inp 



gnp 

pnp 

p *1 ( g *1 p ) 
p 

p 

g-^*2 P~^ 

( goP A PoP ) A goP n PoP 
0 
P 



Fig. 1. Derived constructs and axioms for map algebra. 



logical one can, with their limitations, be on a par. The results reported in [4], 
which we will briefly review in Sec. 2, constitute a veriflcation of this fact. 

Otter is a resolution-style theorem prover developed at the Argonne National 
Laboratory (cf. [8]). It can manipulate statements written in full first-order logic 
with equality. The inference rules available in Otter are: binary resolution, (or- 
dered) hyperresolution, UR-resolution, and binary paramodulation. Otter’s main 
features include: forward and backward demodulation, forward and backward 
subsumption, (a variant of) Knuth-Bendix completion method, weight functions 
and lexical ordering, etc.. Moreover, Otter offers a large number of parameters 
and options to help the user in guiding the inference process. In what follows we 
briefly illustrate those we found more useful in our experimentation. This will 
be done by giving the reader a description of the basic strategy we adopted in 
proving theorems with Otter. As we will see, in most cases this strategy worked 
well, whereas we needed some kind of tuning in order to successfully cope with 
a few theorems. 

Since we are dealing with equality, we selected the Knuth-Bendix completion 
procedure; whenever non-unit clauses or non-equational predicates entered into 
play, we enabled hyperresolution and binary resolution. Paramodulation was 
employed. We usually exploited the default strategies for ordering, demodulation, 
and weighting. Nevertheless, we made systematic use of the parameters devoted 
to limit the search space. In particular, all theorems were proved by imposing 
bounds on the maximum number of literals and distinct variables occurring in 
any derived clause. Moreover, we often imposed a threshold on the weight of 
derived clauses. We also adopted Otter’s default weighting strategy (cf. [8]); in 
some cases we found it useful to give extra weight to certain terms or literals in 
order reduce the time spent for finding a proof. Here are the Otter settings we 
used in almost all experiments we report on (for the parameters and flags not 
mentioned here, we kept the values adopted by Otter’s autonomous mode): 
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°/o Strategy: 


% Limits on the search space: 


set (kimth.bendix) . set (back.demod) . 

set (para_from) . set (hyper _res) . 

set (para.into) . set (binary_res) . 

set (dynamic_demod_all) . 


assign (max_distinct_vars, 3) . 
assign (max.literals , 1) . 
assign (max.weight , 18) . 



Notice that the value assigned to max_weight was usually ‘guessed’ by taking 
into account the syntactical structural complexity of the theorem to be proved. 

Initial experimentation in map reasoning with Otter has been described in [1] ; 
in [6] an equational re-engineering of set theories is presented. Automated set 
reasoning based on this equational formulation of ZF set theory was explored 
in [4]. In particular, in [4] the authors obtained a (semi-) automated proof of a 
fundamental result: by assuming the axioms of a weak set theory (namely, exten- 
sionality, null-set, single-element addition and removal) it was possible to derive 
the existence of a pair of projections satisfying the pairing axiom (cf. Sec. 2, to 
be seen) . This result guarantees the equipollence in means of proof of the equa- 
tional formulation of ZF w.r.t. its first-order version (cf. [15]). We will briefly 
survey this result in Sec. 2. 

The experimentation reported in [4] was essentially carried out by exploiting 
the autonomous mode supplied by Otter and by always adopting the default 
settings. The explicit tuning of parameters and flags was avoided in order to 
obtain a higher independence of the approach from the specific theorem prover. 
Since the syntactic complexity of the theorems tackled in [4] was quite low, this 
approach represented a viable choice. 

The experimentation activity we are going to describe here, is aimed at prov- 
ing theorems that involve set-theoretical concepts whose syntactical and seman- 
tical complexity grow as the experimentation proceeds. This fact can easily be 
grasped by considering the higher level of abstraction of notions such as totality 
or functionality w.r.t. the basic map constructs. To reflect this growth in com- 
plexity, we will develop a layered hierarchy of lemmas. Starting with a ‘kernel’ 
consisting of the constructs and axioms of Fig. 1, we will proceed systematically 
by defining new set-theoretical concepts and by proving groups of laws that 
characterize the new set-constructs. Each one of these extension steps will be 
a (potential) part of the basis for the next extension. Moreover, in proving a 
generic theorem, it will be possible to select a subset of the available constructs, 
together with their laws. This, actually, will help the search for the proof in two 
orthogonal ways: firstly. Otter will deal only with the part of the global environ- 
ment that the user judges to be relevant and related to the theorem to be proved; 
and secondly, the inference activity will be better focused at the most suitable 
level of abstraction. For instance, in proving a law that infers the totality of the 
composition of maps from the totality of the components (cf. Fig. 8), a deep 
treatment of ‘low level’ concepts such as the intrinsic properties of symmetric 
difference should not be needed. 

The first step in the development of our layers consists in proving a series of 
auxiliary laws for the kernel constructs (namely. A, n, o,~^). From the theoreti- 
cal point of view, these laws are not necessary to prove any (provable) theorem of 
map calculus. Nevertheless, experimentation revealed that Otter was unable to 
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law 


premises 


length 


timing 


generated 


kept 


II 


p n 0 = 0 


Ax 


20 


7 


1120 


185 




p n p = p 


Ax 


20 


13 


2304 


382 




p n (P n Q) = Pn Q 


Ax 


27 


13 


2157 


318 


l2 


pnQ = p A QnP = Q^Q = p 


Ax, Ii 


1 


< 1 


2 


24 




pnQ = Q A QnR = Q->Pnp = p 


Ax, Ii 


2 


3 


162 


62 


Si 


P AQ = Q AP 


Ax 


7 


2 


195 


52 




P A (Q A P) = Q A (P A P) 


Ax 


8 


4 


258 


54 




0Z\ P = P 


Ax 


20 


8 


1124 


190 




P Z\P = 0 


Ax 


16 


5 


1110 


180 




PZ\(PZ\Q) = Q 


Ax 


5 


2 


234 


52 




tn(PAP“’^) = 0 


Ax, Si 


199 


5m 30s 


6360755 


13842 




pn(QAP) = (PnQ)A(pnp) 


Ax, Ii,Si 


2 


2 


120 


45 


Gi 


0~^ = 0 


Ax 


22 


8 


1434 


226 




= 1 


Ax 


4 


< 1 


85 


40 




— L 


Ax 


3 


< 1 


38 


22 




(P A 1)-’^ = p-~^ A 1 


Ax, Si 


43 


1.33s 


24972 


2033 




(P AQ)-’^ = p-’^ AQ^’^ 


Ax, Si,CJi 


89 


1.12s 


17147 


1554 


Cl 


0 O P = 0 


Ax 


26 


9 


1447 


231 




P O 0 = 0 


Ax 


17 


8 


1378 


219 




P O L — P 


Ax 


4 


2 


38 


23 




10 1 = 1 


Ax 


29 


20 


3215 


526 




((P O P^’^) Pi.) o P = P 


Ax, Gi, Cl 


66 


18.53s 


221080 


8774 




p o ((Po P”’^) n t) = p 


Ax, Cl, Cl 


71 


19.02s 


227467 


8844 




p n (p o 1) = p 


Ax 


62 


6.36s 


68558 


6734 




p n (1 o p) = p 


Ax 


61 


6.08s 


67926 


6646 



Fig. 2. Laws on the primitive map constructs: H, A, and o 



prove several simple theorems in a reasonable amount of time, unless by employ- 
ing these auxiliary laws. A conspicuous part of the laws regarding the primitive 
constructs are shown in Fig. 2. 

The laws are divided into groups because each group usually corresponds to 
an input file that could be loaded into Otter; moreover, the laws in the same 
group were usually proved by adopting similar settings for parameters and search 
controls, and often by using the same groups of premises as hypotheses. 

For each law in the tables, we indicated: a) the groups of formulas given to 
Otter as input; b) the length of the proof found by Otter; c) the time spent (if 
not differently specified, it is expressed in hundredth of seconds); d) the number 
of clauses generated during the inference process: e) the number of clauses being 
kept (i.e., the generated clauses that fulfill all restrictions on weight, number of 
variables, number of literals, etc.). In our experimentation we used Otter 3.0.6 
running under Linux on a PC (Pentium III-450, with 128Mbyte of RAM). 

Notice that sometimes there are more kept clauses than generated clauses. 
This is because the former include all clauses obtained by processing the input set 
of formulas. The writing ‘Ax’ reported for most of the laws, does not necessarily 
mean that all of the axioms of Fig. 1 have been fed into Otter; usually this is 
the case only when no other group of laws is employed in the proof; otherwise, 
just (part of) the axioms regarding the constructs occurring in the theorem have 
been given in input. For instance, to prove the law 



{{PoP~^)ni)oP = P 



( 1 ) 
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law 


premises 


length 


timing 


generated 


kept 


Ni 


P — P 


Ax 


5 


2 


195 


53 




0 = 1 


Ax 


21 


9 


1229 


318 




1 = 0 


Ax 


17 


9 


1215 


308 




pnQ = QA(PnQ) 


Ax 


11 


4 


361 


77 




FAQ = FAQ 


Ax 


9 


2 


257 


57 




F A F = 1 


Ax 


2 


< 1 


40 


24 




F n F = 0 


Ax 


18 


15 


2210 


496 


N2 


F“^ = F-i 


Ax,Ni,Si,Ii,Gi 


1 


2 


0 


40 




FAF = 1 


" 


1 


2 


0 


40 




FnQ = F->FnQ = 0 


" 


4 


3 


164 


68 




FnQ = 0->FnQ = F 


" 


8 


4 


181 


71 




L n P“^ 0 p — L 


" 


20 


17 


2336 


467 




PZ\Q = pnQnPnQ 


" 


18 


37 


5012 


1435 




PZ\Q = PnQnPnQ 


" 


42 


10m 36s 


11780356 


13860 




PZ\Q = PnQnPnQ 


Ax,Ni,Si,Ii,Gi,N2.6 


7 


10 


1645 


385 




F^nQ^ = (QnF)^^ 


" 


5 


4 


560 


182 




(FAQ)'i = FnQnFng 


" 


3 


2 


0 


43 



Fig. 3. Laws on map complementation 



of group Cl, we exploited the laws of Gi and those of Ci (meaning with this 
that Otter was allowed to use the laws listed before (1) in Ci); moreover, we 
loaded the portion of Ax relative to o and to 

Figures 3 and 4 list the laws on map complementation and map union, re- 
spectively. The definitions of these constructs in term of the primitive ones are 
listed in Fig. 1, together with the map formalization of other notions that will 
come into play in the sequel. 

Other laws on map composition and expressing properties of l are listed 
in Fig. 6. In order to prove these laws. Otter needed to employ the defined map 
constructs of complementation and union, together with their laws. It should be 
noticed that Otter was not able to prove, in a reasonable amount of time, several 
of the laws of Fig. 6 without using the laws in Ii, Ci, Gi, Ui, 2 , 3 , 4 - 

Next come the laws on map inclusion and left-absoluteness. This extension 
of the signature can be considered as preparatory for the study on totality and 
functionality of maps. In turn, the laws on totality and functionality will play 
a crucial role in proving the set-theoretical theses we will report on in later 
sections. 

A few remarks on the behavior of Otter confronted with map calculus are 
due. Firstly, experimentation revealed that, in general, proving a theorem/law 
seems to be more challenging (with our inference machinery) when the map l or 
some of its properties are involved. Consider, for instance, the penultimate law 
in Fig. 2, and the laws involving t in Ci or C 2 . The same can be said for those 
laws that correspond to deep intrinsic characteristics of t, such as the property: 

for each P C r it holds that = P (2) 

This phenomenon could be intuitively explained by observing that statements 
such as (2) assert properties that do not concern the map as a single object, 
but predicate on a relationship holding between the components of each pair 
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law 


premises 


length 


timing 


generated 


kept 


Ui 


PUQ = QUP 


Ax 


8 


< 1 


107 


46 




0 U P = P 


Ax 


19 


3 


675 


122 




1 U P = 1 


Ax 


6 


3 


210 


65 




PU P = P 


Ax 


24 


13 


1746 


478 




p n (Q n (PU P)) = p n Q 


Ax 


37 


17 


1951 


604 




Pu(PnQ) = p 


Ax 


33 


16 


1916 


559 




(pn Q) u (Pn Q) := Q 


Ax 


35 


18 


1996 


624 




P U P = 1 


Ax, Ni 


9 


2 


0 


28 




PUQ = PnQ 


Ax 


19 


11 


1298 


448 


U2 


PU(PUQ) = PUQ 


Ax, Ui 


6 


2 


101 


68 




(P U Q) U P = P U (Q U P) 


Ax, Ii,Ci,Ui 


6 


2.74s 


69861 


1047 




PU (QU P) = QU (PU P) 


n 


4 


2.62s 


68421 


1035 




(p u Q) n (P u P) = p u (Q n P) 


n 


13 


1.41s 


39504 


709 




P U (Q U (P n P)) = P U (Q U P) 


n 


14 


81 


23781 


582 




(p u Q) u (P n P) = p u (Q u P) 


n 


11 


11 


2232 


300 


U3 


PUQ = 0 -s-P = 0 


Ax,U2 


2 


4 


233 


68 




PAQ = (p n Q) u (P n Q) 


" 


82 


1.84s 


26090 


2116 




(p u Q) n (P n Q) = (P n Q) u (P n Q) 


n 


53 


37 


7033 


792 




PAQ = (PuQ)n (PnQ) 


n 


43 


1.44s 


25517 


1802 




1 . n ((p n p-i) u (P n P”’^)) = 0 


n 


35 


9.60s 


101784 


9462 




1 . n ((p n p-i) u (P n p^^)) = 0 


Ax,U2,U3 


6 


5 


0 


94 


U4 


(P U Q) 0 P = (P 0 P) U (Q 0 P) 


Ax 


9 


2 


288 


144 




(P 0 (Q u P))-’‘ = ((P 0 Q) U (P 0 p))-’^ 


Ax, Gi 


42 


42 


5959 


1508 




P 0 (Q U P) = (P 0 Q) U (P 0 P) 


Ax, U 4 


2 


4 


377 


141 



Fig. 4. Some of the laws on map union proved by Otter 





law 


premises 


length 


timing 


generated 


kept 


Yi 


P oQr\R — 0 -?■ P~^ 0 Rr\Q — 0 


Ax 


56 


13 


2104 


328 


Ti 


P = 0V1OPO1 = 1 


Simpl, Ax 


13 


22 


6252 


362 




P01 = 1V10P=:1 


Simpl, Ax 


2 


2 


240 


62 



Fig. 5. Cycle law and some consequences of simplicity 



belonging to the map. In a sense, this kind of statements can be thought of as 
having a ‘deeper character’, or, in other words, to model a sort of deep knowledge 
on the domain(s) of discourse. 

Secondly, simple syntactical changes (preserving the semantics) in the thesis 
to be proved sometimes badly affect Otter’s performances. 

For instance, the proof of (3) (see also Fig. 3) was relatively easy if compared 
with the one of (4), which is obtainable from (3) by just applying the rule 

P — Q P P — Q and by exploiting the double-negation law P — P. 

PAQ = PnQnPnQ (3) 

PAQ = PnQnPnQ (4) 



To find a possible justification of this ‘unstable’ behavior, we have to consider 
that Otter adopts a default lexicographic ordering of terms (whenever the user 
does not supply his own criterion), in order to orient the rewriting rules (recall 
that Knuth-Bendix completion is employed), and to handle demodulation and 
weighting. In the above-mentioned case, the default ordering is the same for 
both theses, but it works better with the former of them. Changing the crite- 
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law 


premises 


length 


timing 


generated 


kept 


C 2 


P n {P 0 (Q n l)) = P 0 (Q n t) 


Ax,Ii,Ci,Gi,Ui, Yi 


21 


20.61s 


236370 


13644 




P n ((Q n I.) 0 P) = (Q n t) 0 P 


" 


21 


40.52s 


584457 


15052 




p n L = P~^ n t 


" 


76 


40.34s 


568993 


14885 




(P n i.)-^ = p-i n L 


" 


3 


7 


946 


160 




(P n = p n t 


" 


74 


43.47s 


616878 


15167 




P~^ 0 P n L — L 


Ax,Ii,Ci.2.Gi,Ui.Yi 


13 


4.78s 


59433 


6707 


C3 


(p-1 0 ((P 0 Q) A 1)) n Q = 0 


Ax 


5 


9 


1217 


241 




(P-^ 0 (P n (1 A (P 0 Q)))) n Q = 0 


Ax 


34 


15 


2472 


442 


c;, 


(p-i oPoQ)nQ = 0 


Ax, Ii,Ci^ 3 ,Gi,Yi 


2 


2 


204 


46 




{P"^’ 0 (p n P 0 Q)) n Q = 0 


Ax, Ii,Ci^ 3 ,Gi,Yi 


4 


9 


2335 


192 



Fig. 6. More laws on map composition and t 



rion for lexicographic ordering (in proving (4)) would have determined a better 
performance. 

As a last remark on this phenomenon, notice that, as one expects, the proof 
of (4) turns out to be extremely easy (cf. Fig. 3) when (3) is included among 
hypotheses. 

There are also cases of laws whose proofs become easier if some additional 
lemmas are given in input (cf., for instance, U3 or lAbsi). This is a motivation 
for our choice of splitting in several groups the laws regarding a particular map 
construct. 

Otter exhibited different behaviors even in proving the same thesis when 
formulated at different levels of our ‘layered architecture’. For example, consider 
the couple of laws l o P n P = 0 and P C £ o P, or the following two 
£oP = P ^ {RoQ)r\P = Ro[Qr\P) and IAbs(P)^(PoQ)nP = Po(QnP) 
(cf. Fig. 7). Experimentation revealed that, in general, the proof turns out to be 
easier when the thesis is expressed by employing the constructs of the higher layer 
(e.g. C instead of “ and n, or IAbs( • ) instead of ‘ F o-’) . Clearly, this is because 
the higher the layer, the greater is the expressiveness of the constructs/operators 
involved and, obviously, the larger is the set of previously proved laws that can 
be usefully used by Otter. This fact strongly supports our choice of developing 
experimentations in a ‘layered’ fashion. 

It is sometimes customary to add to the axiomatization of Fig. 1 the axiom: 
(Simpl) R ^ Ho Ro 1 — F 

It can be shown that any theorem that is proved under this ‘simplicity’ assump- 
tion is also provable without it. Fig. 5 lists some of the consequences of simplicity, 
proved by Otter. 

2 Set-Reasoning in Map Calcnlus 

Often, a particular class of interpretations can be characterized by imposing 
a collection of map equalities that will serve as proper axioms of the specific 
application of interest. A task of this nature has been undertaken in [6], where 
an equational re-engineering of ZF is developed. 
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law 


premises 


len. 


timing 


gen. 


kept 


Inci 


P C P 


Ax,Ii,Ci,Gi,Ni,Ui 


1 


3 


46 


58 




PCQ-^{QCR^PCR) 


" 


8 


4 


362 


107 




P C Q^P-^ C Q-'^ 


" 


7 


7 


1582 


229 




P C Q~y{R C 5->(PnH C QnS)) 


" 


16 


74 


19377 


1638 




P C Q->PnQ = P 


" 


1 


2 


0 


50 




i (Z P 


" 


1 


3 


32 


50 


Inc 2 


P C Q^{R C S^{PoR C QoS}} 


Ax,Ii,Ci,Gi,Ni, 
Ui^ 4 , Yi , Inci 


16 


Im 16s 


2.1-10® 


3425 


Inc 3 


PnQ = P^P C Q 


Ax,Ii,Ci,Gi,Ni, 
Ui,4, Yi, Inci ,2 


1 


3 


1 


54 




PCQ->(QCP^P = Q) 


" 


3 


3 


205 


65 




t n P C P 


" 


4 


5 


413 


90 




I. n P C P-^ 


" 


24 


2m 30s 


3.3-10® 


25386 




P GQ^Q CP 


" 


9 


8 


1641 


268 




P C Q-^(R C S-i{PnS C QnR}} 


" 


10 


11.46s 


76721 


28971 




PCI 


" 


2 


3 


210 


65 




P C Q-^{P G R^{P C QnR}} 


" 


2 


15 


3381 


730 




P C Q^PoP-i C QoQ-i 


" 


2 


25 


6067 


1586 


Inc 4 


F C 1 o P 


Ax, Ii, 2 jCj^ 2 saGi, 
Ni, 2 ,Ui,Yi 


1 


4 


201 


104 




P C P o 1 


" 


1 


6 


164 


99 




PnQ C (loP)nQ 


Ax, Inci,2,3 


3 


5 


818 


235 




P o ((1 o Q) n P) = (1 o Q) n (P o R) 


Ax, lAbsi.lO 


77 


1.57s 1 


17442 


2695 


lnc5 


(PnQ)oP C PoPnQoP 


Ax, Inci,2,3 


6 


18 


5199 


281 




Po(QnP) C PoQnPoP 


Ax, Inci, 2, 3 


6 


18 


5199 


281 


lAbsi 


IAbs( 1) 


Ax,Ii,Ci,Gi, 

Ni,Ui,4,Yi 


1 


1 


48 


48 




IAbs( 0 ) 


" 


1 


2 


11 


47 




IAbs( 1 o P ) 


" 


3 


6 


958 


188 




lAbs(P) ^ lAbs(P) 


" 


16 


24.38s 


257235 


10844 




lAbs(P) ^ lAbs(P) 


Ax,Ii,Ci,Gi, 

Ni,Ui,4,Yi,lAbsi 


21 


76 


18640 


1525 




lAbs(P) ^ IAbs(P o Q) 


" 


6 


12 


2831 


314 




IAbs( P ) A IAbs( Q ) IAbs( P U Q ) 


Ax, U 4 , lAbsi 


5 


99 


8229 


5234 




lAbs(P) A lAbs(Q) IAbs(P n Q) 


Ax, N 4 , U 4 , lAbsi 


4 


21 


3114 


2159 




loP = P^(PoQ)nP = Po(QnP) 


Ax, Cl , Gi , 
Ni,Ui, 4 ,Yi,lAbsi 


139 


18.75s 


172397 


13368 




IAbs( P ) ^ (P o Q) n P = P o (Q n P) 


" 


6 


65 


7659 


4056 




IAbs( P ) A IAbs( Q)^lLo(PnQ) = PnQ 


Ax, lAbsi 


2 


32 


4942 


4733 



Fig. 7. Laws on map inclusion and left absoluteness of maps 



Two derived constructs are of great help in stating the properties of mem- 
bershipd d{P) P o and T(P) =Det 9( P ) \ P o 6. 

By means of these constructs it is possible to express within the map calculus 
a number of axioms of ZF set theory as briefly summarized in Fig. 9, where we 
listed the map formulations of extensionality (E), power-set {Vow), union-set 
{Un), transitive embedding (T), separation (S), pairing (Pair), finiteness (F), 
foundation (R), infinity (I), and replacement (Repl). 

A detailed treatment of this map formulation of ZF can be found in [6]. As an 
example, let us consider the extensionality axiom. It states that sets whose 
elements are the same are identical (see also Fig. 10). This can be rendered by 
the following map equality: T( 3 ) = t, . 

^ Plainly, ad{ Q )'^b and a J( R)'^b hold in an interpretation if and only if, respectively, 

• all cs in U for which aQ~^c holds are ‘elements’ of b (in the sense that cE.'^b)-, 

• the elements of b are precisely those c inU for which aR^c holds. 
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law 


premises 


len. 


timing 


gen. 


kept 


Toti 


Total(l) 


Ax, Ii , Cl , Gi , Y 1 


1 


< 1 


99 


34 




Total(f-) 


Ax,Ii,Ci,Gi,Yi,Toti 


1 


< 1 


98 


33 




Total(r) 


Ax, N 4 , Ui, 4 , lAbsi 


5 


1.37s 


21280 


2311 




Total(P n Q) -V Total(Q) 


Ax, Ii , Cl , Gi , 

Yi,Ni,Ui,Toti 


7 


12 


3530 


133 




Total(P o Q) -!■ Total(P) 


" 


8 


11 


3530 


128 




Total(P U P o 1) 


" 


22 


1.07s 


25650 


1791 




Total(P A P o 1) 


" 


53 


85 


9111 


1277 




Total(P“^) V Total(P) 


Ax,Ci,Gi, 

Ni , Toti , Simpl 


4 


2 


275 


92 




Total(P) VTotal(P“'^) 


" 


4 


2 


349 


107 




Total(P) V Total(l o P-i) 


" 


6 


5 


531 


132 




P n P-^ = 0 ^ Total(P) 


Ax,Ii,Ci,Gi, 

Yi,Ni,Ui,Toti 


7 


6 


1148 


225 




Total(P) A Total(Q) ^ Total(P o Q) 


" 


7 


11 


1584 


419 




Total(P) A Total(Q) ^ Total((Pol)n(Qol)) 


" 


3 


2 


8 


40 




Total(P) A Total(Q) A Total(P) 


n 


5 


13 


1705 


651 




^ Total((P o Q) n (P o 1)) 








PoQ = 1 ^ Total(P) V Total(Q) 


" 


2 


< 1 


80 


50 




PoQ“^ = 1 ^ Total(P) A Total(Q) 


" 


5 


56 


3130 


1718 




PoQ = 1 ^ Total(P) A Total(Q“^) 


" 


5 


5 


334 


114 




PnQ = P A Total(P) -V Total(Q) 


" 


2 


4 


89 


76 




PoQ^^= 1 A Total{P) ^Po(Q*^oP“^) = 1 


" 


5 


3 


189 


83 




PoQ-^ = 1 ^ Total(PnQ) 


" 


2 


1 


11 


8 




PoQ-^ = 1 A Total(P) ^Total(Pn(PoQ)) 


" 


7 


31 


10191 


568 




Total(P) A Qo(Po5) = 1 


n 


2 


1 


g 


23 




Total (P o (Q o (Po 5))) 










PoQ“^ = 1 A Total(P) A Total(5) 


n 


45 


9m 12s 


6 . 6 - 10 ® 


30429 




^Total((5oP)n(PoQ)) 






IAbs( P ) -> (P = 0) V Total(P) 


Ax, C 1 ,lAbsi , Simpl 


7 


1.91 


44040 


1809 


F’unci 


Func( 0 ) 


Ax,Ii,Ci,Gi 


2 


2 


92 


39 




Func( L ) 


Ax,Ii,Ci,Gi 


2 


2 


110 


44 




Func( P) Func( PnQ) 


Ax, Ii, Inci,2,3 


9 


74 


20065 


913 




Func( P ) A Func( Q)APCQaQC Pol 
^P = Q 


Ax, Ii, 2 , Cl, 2 , 
Si,Ni, 2 , Yi 


288 


51m 36s 


3.4-10^ 


24052 



Fig. 8. Totality and functionality of maps 



An alternative formulation of extensionality. A useful variant of the exten- 
sionality axiom is the scheme Func( T( P ) ), where P ranges over all map expres- 
sions. Our first task in automated set-reasoning consists in proving the equiva- 
lence of the two formulations of (E), i-e., that: = Func(T(P)). 

Otter was unable to prove this theorem in a single shot. Hence we had to 
split the theorem into two. First, we got a proof of Func( T( P ) ) P T( 3 ) = /., 
via the following sequence of intermediate results: 



law 


length 


timing 


note 


r C TO ) 
Func( T( 3 ) ) 
T( 3 ) C t.ol 
TO) = . 


3 

3 

0 


4 

2 

< 1 


by using Ii, Ci, 3 , Gi, Ni, Ui,2,3,4, Yi 
immediately from the hypotheses 

by Ax, Inci , Funci 

immediately from Funci 
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(E) 


T(3) = r 


(Vow) 


Total ^ d{ ) ) 


(Un) 


Total 


(T) 


Total (^6ornd( 303 


(S) 


Total funcPart( Q )o 3 DP j j 


(Pair)i_ 2 , 3,4 


Tto^^'OTri = £, Func(7to), Func(7ti), 603 = 1 


(Pair)s 


TTo 0 TTq n 7 Tl 0 Ttj^"'’ \ t = 0 


(F) 


L C £ 0 ( 6 n ( ( t U ^06 ) t 0 ) ) I ^ 


(R) 


Total) 3 o£U 3 \ 3o6 ) 


(I) 


Total £o(^ d( 303 ) n d{ 303 )”^\ 6 \ 3 \t\ 3 o 6 A 3 o 6 jj 


(Repl) 


Total ^< 9 ^( 7 too 3 o 7 r)j'''‘n tti 0 ) 0 funcPart) Q ) j j 



Fig. 9. Axioms of set theory within map calculus. 




Fig. 10. Specification of a weak set theory in hrst-order logic and in map algebra. 



The converse, i.e. 3^{B) — t P Func( T( P) ), was proved as follows: 



law 


length 


timing 


note 


T(P)"^oJ(P) C ^oP-^oPoe 


10 


12.29s 


by Ax, Gi, Ni 


T(P)”‘^oJ(P) C ^oP-ioPo0 


9 


12.38s 


by Ax, Gi, Ni 


T(P)"^oJ(P) C J(3) 


3 


2 


by Inci 


3^1p)-^oJ(P) C l 


1 


< 1 


by Inci 



Designing pairs of conjugated projections. In [4] a possible choice was 
proposed for two maps ttojTTi which fulfill the pairing axiom, provided that the 
axioms of a weak theory of sets — i.e.,extensionality , null set, single-element 
addition and removal — are assumed (cf. Fig. 10, where uppercase letters stand 
for variables ruled by universal quantifiers). The authors presented an Otter- 
based proof that those two specific maps satisfy (Pair)^ 2 s- that context, the 
approach to experimentation was aimed at ‘miniaturizing’ the obtained proofs, 
i.e., at developing the proofs by starting with the raw axiomatization of Fig. 1, 
without the explicit introduction of defined constructs, and by strictly interacting 
with and guiding Otter, to make it perform only the essential inference steps. 
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It is claimed in [7] that (W) is not, when taken alone, expressible in map 
algebra.^ Nevertheless, in [4] it is shown that taken together with (N) these 
axioms enable one to build the pair { {F} \ {X}, {F} U {X} } out of any given 
sets X and F. This fact, thanks to (E), can be stated as 

3d [Y Ed A Vu [u = X^3 w 3 w ( uEvEd A u^wEd ) ) ) , 
and in turn (again with the contribution of (E)) it yields (N), (W), 
and (L). Consequently, the set axioms of Fig. 10. a can be translated as 
shown in Fig. 10. b. The key point consists in observing that the map 
1 / =Det n valve(6oC, occurring in (WL), enables a quick imple- 

mentation of the projections 

ttq =Def , 7Tl =Def V3 1 V6^ ^ ( 6 o 6 , «^) • 

The main result of [4] consisted in proving within map algebra (under minimal 
assumptions on membership), that ttq and tti designate conjugated projections. 
As mentioned, the important consequence is that the equational specification of 
our assumptions on membership has the same deductive power as its counterpart 
formulated in quantified first-order logic; this follows from results in [15]. 

The experimentation reported in [4] proceeded by proving a number of inter- 
mediate results ultimately yielding the desired proof. The most relevant of them 
is the following lemma: 

Lemma. (Functionality) QoQ~^ C t entails valve(P, Q) ovalve^^(P, Q) C l. 

This lemma mainly relies on various elementary Boolean identities, and on 
some obvious consequences of the Peircean axioms (i.e., the logical axioms re- 
garding 0 ,“^, and l). The only non-obvious laws on maps needed are the so-called 
cycle law (cf. Fig. 5) and Dedekind law (cf. [13]): 

PoQnR C (PnRoQ-^)o(QnP-^oR). 

A ‘miniaturized’ derivation of the Dedekind law was obtained from the bare 
axioms in Fig. 1. It consists in 25 verifications of the average CPU-time cost of 
6 to 8 seconds (depending on the machine).^ It is worth stressing that these 25 
steps included the proofs of basic facts such as some of the laws on symmetric 
difference, intersection, and composition already seen in Figures 2 and 2. 

While the functionality lemma easily allowed Otter to prove (Pair) 2 3, in 
order to proof that (Pair)^ it was necessary to proceed as follows. First, a 
temporary assumption was made that a singleton set {a} can be formed out of 
any given a. This assumption can be stated more precisely as follows: 

(Sng) sng o 1 = 1, where sng =Def€\7o6, 

holds along with (WL) . Hence we have the following lemma: 

Lemma. Assume (Sng) and (WL). It follows that u o tti — H. 

To prove this result it turned out that Otter had to extensively use map- 
inclusion laws drawn from the list shown in Sec. 1. 

The subsequent step consists in proving that it is actually possible to do 
without a postulate of singleton formation. Verifying this claim amounted to 
getting an automated proof of the derivability of (Sng) from (WL) and (N). 

^ A proof of this fact, which unfortunately remains quite obscure to the authors, is 
supplied by [7]. 

® These verifications were run on a G3 Macintosh and under Linux. 
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In this case, an analysis of Otter’s proof showed that the most useful intermediate 
results (implicitly proved in the main proof) were the laws on totality. 



Totality of some elementary relations on sets. By using the laws of Sec. 1, 
Otter was able to prove the totality of a number of relations on sets. We exhibit 
below an excerpt of the results we obtained. The laws of Fig. 8 intervene crucially 
in these proofs. 

• Total(6o3). Thanks to (Pair), it reduces to prove that Total(Il) holds. It 
was immediately proved from the laws on totality. 

• Total) go 1). It follows from the previous result and from the laws in Fig. 8. 
It was proved in 0.02 seconds, the proof-length is 3. 

• Total) g). It follows from the previous results and from the laws in Fig. 8. It 
was proved in 0.02 seconds, the proof-length is 1. 



A general technique for proving totality of set constructors. The next 
task consists in obtaining the proof of a general law for deriving the totality 
of expressions of the form Total(T( i? )). This law will give us the capability of 
defining a number of set-constructs (cf. [5]). Let us start with two useful lemmas. 
Lemma. F or any P, Q such that 

P~^ o Q C 3 and Func(7Ti) (5) 

it holds that: 

( P O 7Tq ^ n TTj^^ ) O T( TTo o 3 n 7Tl o <5 ) C T(Q). (6) 

In the following we describe Otter’s proof. The thesis (6) can be rewritten as 

( P o ttq ^ n TTi^ ) o o 3 n TTi o Q) G Q o ^ n Q o E (7) 

By assuming the hypothesis (5).l, Otter was able to prove the following inter- 
mediate result: (tto o P^^ n tti) o Q ) C tto o 3 n tti o Q. Otter proved this result 

in 0.31 seconds, it generated 4162 clauses (the number of kept clauses was 915). 
The proof-length was 4. The proof was easily obtained by extensive use of the 
map-inclusion laws (cf. Fig. 7). The main settings used to drive Otter imposed 
any generated clause consisting of more than two literals, or having more than 
two distinct variables, to be discarded. From (7), by exploiting the cycle law and 
the laws on inclusion. Otter easily proved that: 

( P O TTq ^ n 7T]f^ ) O T( TTo o 3 n 7Ti o Q ) C Qo0 (8) 

The proof was found in 1.30 seconds (its length was 9), by generating 13729 unit 
clauses (max_literals=l and max_distinct_vars=3) and keeping 2652 clauses. 

On the other hand, the following map inclusion was proved by assuming the 
functionality of tti (cf. hypothesis (5). 2), in 0.81 seconds. The proof-length was 
13 (the generated and the kept clauses were 9848 and 2097, respectively): 

( P O TTq ^ n 7T]f^ ) O T( TTo o 3 n 7Tl o Q ) C Qog 



(9) 
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Putting together the two results ( 8 ) and (9), in order to obtain the thesis ( 6 ), 
took 0.08 seconds (two inferences, by hyper-resolution). 

Lemma. Assume (Pair)j^ 2 &nd (S). Then for any P, Q 

Total(P) Total) ( PoTTQ^nTTjf^ ) o J( ttooBOttioQ ) ). 

Otter proved this lemma (by proving two intermediate results) in a total time 
of 0.24 seconds. On this ground, the following proposition was proved. 
Proposition. Assume (Pair)j^ 2 3 &nd (S). Then for any P, Q 

Total(P), p-^oQ C 3 Total(T(Q)). (10) 

This proposition was proved in two stages. We first drew 

from the hypotheses a series of intermediate lemmas yielding 

(Pottq ) o T( 7 Too 3 n 7 riO( 5 ) C T(Q). The thesis then readily fol- 

lowed, with the help of the laws on totality. The overall time of this proof was 
3.57 seconds. 

By using this general tactic. Otter proved the totality of several map expres- 
sions, certifying in this way that these expressions characterize legal operations 
on sets: 

• Total(T(/,)). It defines the singleton operation a >->■ {a}. Its totality was 
proved in 0.05 seconds (length:?, generated:768, kept:108), by using the re- 
sult previously obtained: Total(€) (Otter instantiated P = E and Q = l in 
proposition ( 10 )). 

• Total(T(0)). It characterizes the nullset constructor: a !—>■{}. As in the pre- 
vious case, its totality was proved in 0.04 seconds (length:3, generated:335, 
kept:52). Notice that this thesis was proved also without resorting to the 
above proposition, but in this case Otter’s task was more difficult: the proof 
was produced in much more time: 1.15 seconds. Otter used the laws in 
Ci,Ii_ 2 , Gi, Ni _2 and in particular those in Toti; it generated 21521 clauses, 
keeping 343 of them. 

• Consider the two axioms Total(9( ^oE ) ) and Total(9( 3 o 3 ) ) (cf. Fig. 9). 

Otter was able to prove their stronger version: Total) T( ) ) and 

Total ( T( 303 ) ) by using, in particular, the law (10) and the cycle law. The 
first proof was generated in 0.11 seconds (length:4, generated:2616, kept:265). 
The strong version of the second axiom was proved in 17.88 seconds (length: 6 , 
generated:386130, kept:5070). 

• A more general result was also proved. Namely, under the axioms (Pair) and 
(S), Otter proved this property of totality: Total(9(P)) 1^ Total(T(F)). 
The proof was found in 0.12 seconds (length:4, generated:2616, kept:265) by 
using the above proposition, the cycle law, and the laws of Fig. 8 . 

Conclusions 

We reported on an initial experimentation activity in automated set-reasoning, 
based on a formalization of axiomatic ZF theory within a ground equational 
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framework. This approach made it possible to profitably exploit traditional first- 
order theorem provers for experimenting in set reasoning. The main efforts were 
devoted to develop a structured methodology for experimentation. This approach 
allowed Otter to prove several theorems of map algebra as well as of set theory 
that it was not possible to prove (with Otter) in absence of a layered methodol- 
ogy. Clearly, this approach is not specific to Otter and can be adapted to other 
(first-order) automated theorem provers as well. 

We moved the first steps in equational set-reasoning by setting the ground 
for further studies and experimentation. Analogous proposals for the automa- 
tion of set-reasoning have been developed by other researchers (cf. [3,11,12], 
among others). A comparison with these approaches certainly deserves further 
investigation and is matter of ongoing work. 

The ultimate purpose of the research we presented here should consist in as- 
sessing what can be achieved nowadays by applying unspecialized proof methods 
of automated deduction in the set-theoretical context. The knowledge we gain 
can be used both to refine our approach based on first-order theorem provers, 
and as an aid to single out which would be the most promising specialized meth- 
ods to be employed in the realization of an ad hoc basic inference machinery for 
(and hence, for set theory). 

Moreover, the assessment of the exact kinship of our own formulation of 
ZF with ZF proper on the one hand, and with NGB on the other, is a crucial 
theoretical issue and constitutes a challenging and promising starting point for 
future research. 
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Abstract. A boolean formula in conjunctive normal form (CNF) F is 
refuted by literal-once resolution if the empty clause is inferred from 
F by resolving on each literal of F at most once. Literal-once resolu- 
tion refutations can be found nondeterministically in polynomial time, 
though this restricted system is not complete. We show that despite of 
the weakness of literal-once resolution, the recognition of CNF-formulas 
which are refutable by literal-once resolution is NP-complete. We study 
the relationship between literal-once resolution and read-once resolution 
(introduced by Iwama and Miyano). Further we answer a question posed 
by Kullmann related to minimal unsatisfiability. 



1 Introduction 

Resolution is a method for establishing the unsatisfiability of formulas in con- 
junctive normal form (CNF), based on the resolution rule: if CiVJ{£} and C 2 Ll{£} 
are clauses, then the clause CiU C 2 may be inferred, resolving on the literal £. 
A resolution refutation of a CNF-formula F is a derivation of the empty clause 
□ from F, using the resolution rule. It is well-known that resolution is sound 
and complete, i.e., a CNF-formula is unsatisfiable if and only if there is a resolu- 
tion refutation of it ([14]). Resolution refutations can be represented as binary 
trees, where the leaves are labeled by clauses of F (see Figure 1 for an example). 
Unfortunately, the size of a shortest resolution refutation of a CNF-formula F 



{x,y} {x,y} {x,y} {x,y} 

\ / \ / 

{2/} m 



Fig. 1. A resolution refutation of F = {{®, y}, {x, y}, {x, y}, {x, y}}. 
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{x,y} {x,y} 



{y,z} {y,z} 




{y} 




□ 





m 



Fig. 2. A resolution refutation which is not read-once. 



can be exponential in the number of clauses of F ([6,7]). Therefore, considerable 
effort has been made to identify restricted (and incomplete) classes of resolution 
refutations where the size of refutations is polynomially bounded by the size of 
input formulas (see [10] for a survey). One of the best known examples is unit 
resolution, where the resolution rule is only applied to pairs of clauses Ci, C 2 if 
Cl or C 2 is a unit clause (i.e., a singleton). Unit resolution is not complete any 
more, but the class of formulas which can be refuted by unit resolution can be 
recognized in linear time (see, eg., [10]). 

Iwama and Miyano ([8]) considered read-once resolution, where each clause 
of the input formula must be used at most once in a refutation; i.e., two leaves 
of the resolution tree may not be labeled by the same clause. (In [8] also reso- 
lution refutations are considered, where clauses of the input formula may used 
more than once, but the number of repetitions is restricted.) For example, the 
refutation exhibited in Figure 2 is not read-once, since the clause {x, z} occurs 
at two leaves (in fact, it can be shown that for F = {{x,y,z}, {x,z}, {x,y}, 
{x,y,z}, {y,z}} no read-once resolution exists, despite F being unsatisfiable; 
see [8] or Proposition 1 below). It is easy to see that the size of a read-once 
resolution refutation is polynomially bounded by the size of the input formula. 
However, in [8] it is shown that — in spite of the shortness of read-once resolu- 
tion refutations — it is NP-complete to recognize formulas which can be refuted 
by read-once resolution. 

If we modify the above example by adding two clauses {w, x, z} and {t/J, x, zj 
to F, then we get a read-once resolution refutation (exhibited in Figure 3). There 
are still two occurrence of {x,z}, but one occurrence became an interior vertex of 
the tree, and so the refutation became read-once. Thus, it is natural to consider 
resolution trees where no clause appears more than once at any position in the 
resolution tree. We call such refutations strict read-once. It can be shown that 
there are CNF-formulas which are refutable by read-once resolution, but not 
by strict read-once resolution (see Proposition 1 below). Since strict read-once 
resolution is therefore weaker than read-once-resolution, it is conceivable that 
refutability by strict read-once resolution can be decided in polynomial time. 
We will show, however, that recognition of formulas refutable by strict read-once 
resolution is NP-complete. 
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{w,x,z) {w,x,z) 




Fig. 3. A resolution refutation obtained from Figure 3; it is read-once, but not strict 
read-once. 



Going one step further, we also consider a type of resolution which is even 
weaker than strict read-once resolution: a resolution tree is literal-once if it does 
not contain two or more vertices whose clauses are inferred by resolving on the 
same literal. For example, the resolution refutation depicted in Figure 1 is strict 
read-once, but it is not literal-once, since clauses at two positions are inferred 
by resolving on the same literal x. However, it is easy to see that every literal- 
once resolution refutation is a (strict) read-once resolution. The main result of 
this paper is the intractability of literal-once resolution; i.e., it is NP-complete 
to recognize CNF-formulas which are refutable by literal-once resolution. 

Furthermore, we show that intractability of read-once resolution can be ob- 
tained as corollary of our main result. This fact may be of interest, since Iwama 
and Miyano obtain the quoted result solely by presenting a single example with- 
out giving an accurate proof. 

In [11] Kullmann asked for the computational complexity of finding a subset 
F' of a given formula F such that 



(i) F' is minimal unsatisfiable {F' is unsatisfiable, but every proper subset of 
F' is satisfiable), and 

(ii) F' has exactly one more clause than variables. 



We denote by MU(1) the class of formulas F' satisfying (i) and (ii). This class 
is of special interest; for example, every minimal unsatisfiable Horn formula be- 
longs to MU(1) ([4]). We show that F has a subset F' G MU(1) if and only if F 
is refutable by literal-once resolution. Whence the intractability of Kullmann’s 
problem follows from the NP-completeness of refutability by literal-once reso- 
lution. 
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2 Notation 

2.1 Digraphs 

We denote a digraph D by an ordered pair (V, A) consisting on a finite nonempty 
set V of vertices and a set A of ores; an arc is an ordered pair (tt, v) of distinct 
vertices u,v G V. Let D = {V,A) be a digraph and v € V. We denote the sets 
of incoming and outgoing arcs of v by out(r) = { (u,w) & A \ u = v } and 
= { (o, w) € A I re = r }, respectively. For {u,v) G A we say that u is a 
predecessor of v and that v is a successor of u. 

A digraph T = (V, A) is an in-tree if there is exactly one vertex v without 
successors (the root of T), and for every vertex w G V there is exactly one 
(directed) path from w to v. Consequently, every vertex which is different 
from the root has exactly one successor. A vertex without predecessors is a leaf 
An in-tree T is binary if every non- leaf has exactly two predecessors. Note that a 
binary in-tree with k leaves has 2fc — 1 vertices. For graph theoretic terminology 
not defined here, the reader is referred to [2] . 



2.2 CNF-Formulas 

Let var be a set of boolean variables. A literal i is an object of the form a; or 5; 
for X G var; in the first case we call £ positive, in the second case negative] for a 
negative literal £ = x, x G var, we put £= x. Literals £ and £ are complements of 
each other. If cc is a variable and £ G {x,x}, then we call x the variable of £ and 
write var(^) = x. A clause is a finite set of literals without complements. The 
empty clause is denoted by □. For a clause C we put var(C) = { var(£) \ £ G C }. 
A CNF-formula (or formula, for short) is a finite set of clauses. For a formula F 
we put var(F) = (Jcgf var(C'). A literal £ is a pure literal of F if £ G Ucgf C ^£. 
A formula F is Florn if every clause in F contains at most one positive literal. 

A truth assignment t to a formula F is a map t : var(F) -G {0, 1}. Let t be a 
truth assignment to F; we put t{x) = 1 — t{x) for x G var(F), and we say that 
t satisfies a clause C G F if t{£) = 1 for at least one literal £ G C. Furthermore, 
we say that t satisfies Fift satisfies all clauses of F. A formula F is satisfiable if 
there is a truth assignment which satisfies F; otherwise F is called unsatisfiable. 
We denote the set of all unsatisfiable formulas by UNSAT. 

2.3 Resolution 

Let Cl, C 2 be two clauses. If there is exactly one literal £ such that £ G C\ and 
J G C 2 then we call the clause C = (Ci \ {£}) U (C 2 \ {£}) the resolvent of C\ and 
C 2 ', in this case we also say that C is obtained from Ci, C 2 by resolving on £. 

Let To = {V,A) be an in-tree and A a labeling of its vertices such that A(u) 
is a clause for every v G V. We call T = (V,A,X) a resolution tree if for every 
vertex v G V with predecessors vi, V 2 it holds that A(u) is the resolvent of A(wi) 
and A(u 2 ). Let T = {V,A,\) be a resolution tree and u G U. If u is a leaf, 
then we put rlit(u) = 0; otherwise v has two predecessors, say v\ and V 2 ', we put 
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rlit(w) = (A(wi) U A(w 2)) \ A(?;). We call the elements of rlit(z;) resolution literals 
of V. A clause C is a premise of a resolution tree T if A(u) = C for some leaf v of 
T. We write pre(T) for the set of all premises of T. A clause C is the conclusion 
of T if A(u) = C for the root u of T; in this case we write con(T) = C. A 
resolution tree T is a resolution refutation if con(T) = □. Let F be a formula 
and T a resolution refutation. If pre(T) C F then we say that F is refuted by 
T, or that T is a resolution refutation of F. A resolution tree T = (V,A, A) is 
trivial if \V\ = 1. Clearly, a formula F is refuted by the trivial resolution tree 
T = ({u}, 0, A) if and only if A(u) = □ € F. 

For a resolution tree T = (V, A, A) and v G V we define Ty to be the resolution 
tree (W, A', A') where {V ,A') is the maximal subtree of (V, A) with root v and 
A' is the restriction of A to W. 

It is well-known that a formula F is unsatisfiable if and only if it can be 
refuted by some resolution refutation T. 



3 Restricted Types of Resolution 

Read— Once Resolution. A resolution tree T = (V, A, A) is read-once if 
A(w) yf A('ic) for any two distinct leaves v,w of T. We denote by ROR 
the class of all formulas refutable by read-once resolution refutations. (ROR 
corresponds to the class which is denoted by R{ 0 ) in [8].) 

Strict Read— Once Resolution. A resolution tree T = (V, A, A) is strict read- 
once if \{v) yf A(rc) for any two distinct vertices v, w of T. We denote 
by SROR the class of all formulas refutable by strict read-once resolution 
refutations. 

Literal— Once Resolution. A resolution tree T = (1/,A, A) is literal-once if 
rlit(u) yf rlit(ix;) for any two distinct non-leaves v, w of T. We denote by LOR 
the class of all formulas refutable by literal-once resolution refutations. 

Proposition 1 LOR C SROR C ROR C UNSAT. 

Proof. If a resolution refutation is literal-once, then it is obviously strict read- 
once; thus LOR C SROR. Consider the formula F = {{x,y}, {x,y}, {x,y}, 
{x, y}}. Figure 1 shows a strict read-once resolution refutation T of F, hence F G 
SROR. (We note in passing that F belongs to a subclass of minimal unsatisfiable 
formulas characterized in [9].) However, T is not literal-once. It is easy to see that 
there is no literal-once resolution refutation of F at all. Whence LOR C SROR 
follows. 

We have SROR C ROR by definition. Consider the formula F = {C\, , C5} 
with 

Ci = {x,z}, Ci = {x,y,z}, 

C2 = {x,y}, C^ = {x,y,z}, 

C3 = {y,z}. 

Figure 2 exhibits a resolution refutation of F, hence F G UNSAT. We show that 
F ^ ROR. Consider a resolution refutation F of F with root v, and let vi , V 2 the 
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predecessors of u. Clearly |con(T„J| = |con(T„2)| = 1. However, no pair of clauses 
C , C" G F have a resolvent C with \C\ = 1. Thus |pre(T„J| , |pre(T^2)| > 3. Since 
|F| = 5 it follows that pre(T„Jnpre(r„2) yf 0- Consequently, T is not read-once. 
Hence F i ROR and so ROR yf UNSAT. 

Let W\ = {w,x,z}, W2 = {w,x,z}, and consider F* = F U {W\,W2}- Ob- 
serve that Cl is the resolvent of Wi and W2- The resolution tree exhibited in 
Figure 3 shows that F* G ROR. Consider a read-once resolution refutation T 
of F* . We show that T is not strict read-once. Again, let vi,V2 be the prede- 
cessors of the root of T. W.l.o.g., we assume |pre(T^J| < |pre(T„2)|. Similarly 
as above, |pre(T^J| , |pre(T„2)| > 3 follows. Since T is assumed to be read-once, 
|pre(T„J| -I- |pre(T„2)| < |F*|; thus |pre(T^J| = 3. It can be verified that there 
is no resolution tree T' with pre(T') C F*, |pre(T')| = 3 and |con(T')| = 1, 
such that Wi G pre(T') or W2 G pre(T'). However, Wi,W2 G pre(T) since 
F ^ ROR. It follows that Wi,W2 G pre(T„2) and |pre(T„2)| = 4. Hence we 
have pre(T„2) = {lUi, IF2, Fi, ZI2} for some Di,D2 G {C2, . . . , Cs}. Check- 
ing all possibilities for Di,D2 shows that either {Di,D2} = {02,04} or 
{Di,D2} = {03,05}. In both cases, the two vertices Ui,U2 of which are 
labeled by Wi and W2, respectively, have a common successor u. Evidently u 
is labeled by Oi. Since Oi G pre(T), it follows that T is not strict read-once. 
Whence SROR yf ROR. □ 

4 NP-Completeness Results 

Let F be a formula with m clauses and T = {V, A, A) a read-once (strict read- 
once, literal-once, respectively) resolution refutation of F. Clearly T has at most 
m leaves, and so |U| < 2m — 1. Thus one can guess such resolution refutation T 
of F and verify in deterministic polynomial time whether T is indeed read-once 
(strict read-once, literal-once, respectively). Hence the following holds. 

Lemma 1 The recognition problems for LOR, SROR, and ROR are in NP. 

Next we state our main result whose proof we present in Section 6. 

Theorem 1 Recognition o/LOR is TIV -complete. 

We are going to show that recognition of ROR and recognition of SROR 
are both NP-complete problems as well. We proceed by reducing recognition of 
LOR to recognition of SROR and ROR, respectively. For these reductions, the 
following construction is crucial. 

Let F be a formula. For each x G var(F) we take two new variables x[l], x\2], 
and for every clause (7 G F we define 

C° = {^]\xGC}A{^]\xGC}. 

We put 

F° = { I C G F } U { {x[l], x[2]} I a: G var(F) }. 
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Observe that F° is satisfiable if and only if F is satisfiable; furthermore, for 
every x[i] G var(_F°) there is exactly one clause C G F° with x[i] G C. 

The following result is a direct consequence of Lemmas 4, 5, and 7, which 
are more technical and will be presented in the Appendix. 

Proposition 2 For every formula F the following statements are equivalent. 

F G LOR; F° G ROR; F° G SROR. 

The next two results follow from Theorem 1 and Proposition 2. 

Theorem 2 Recognition o/SROR is NP-complete. 

Theorem 3 (Iwama and Miyano [8]) Recognition o/ROR is -complete. 

5 Literal— Once Resolution and Minimal Unsatisfiable 

Formulas 

In this section we apply Theorem 1 to answer a question posed by Kullmann 
([11]). A formula F is minimal unsatisfiable if F is unsatisfiable but F \ {C} is 
satisfiable for every C G F. The deficiency 6{F) of a formula F is defined by 

5(F) = |T’|-|var(F)|. 

Let k be an integer; we write MU(fc) for the class of minimal unsatisfiable for- 
mulas F with 6{F) = k. By a result due to Tarsi ([1]), MU(/c) = 0 for fc < 0. 
Recognition of minimal unsatisfiable formulas is D^-complete ([12]); however, 
for every fixed k, the class MU(fc) can be recognized in polynomial time ([11,5]). 
In [11], Kullmann asked whether recognizing 

C = { F \ there is some F' C F with F' G MU(1) } 

is NP-complete. We answer this question positively: in the next lemma we show 
C = LOR; hence NP-completeness of C follows from Theorem 1. 

Proposition 3 Let F he a formula. Then F G MU(1) if and only if there is a 
literal-once resolution refutation T with pre(T) = F. Consequently LOR = C. 

Proof. We apply the following results from [4]. 

(i) If F G MU(1) and F yf □ then there is a literal £ and clauses C'i,C '2 G F 
such that Cl is the only clause of F containing £; C 2 is the only clause of F 
containing £. 

(ii) Let F be a formula and £ a literal such that there are unique clauses Ci, C 2 G 
F with £ G Cl and £ G C 2 ', let Ci ^2 be the resolvent of Ci and C 2 . Then 
F G MU(1) if and only if (F \ {Ci, C 2 }) U {Ci, 2 } G MU(1). 
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We proceed by induction on |F|. The proposition evidently holds if |J^| = 1; 
hence consider |_F| > 1. Assume F G MU(1) and choose £, Ci, and C2 according 
to (i). It follows now from (ii) that F* = {F \ {C'i,C'2}) U {^1^2} G MU(1). 
By induction hypothesis, there is a literal-once resolution refutation T* with 
Cl, 2 G pre(r*) = F*. We extend T* to a a literal-once resolution refutation T 
with pre(T) = F by adding leaves vi,V2 (labeled by Ci and C2, respectively) 
to T*. 

Conversely, assume that there is a literal-once resolution refutation T = 
(R, A, A) with pre(T) = F. Choose two leaves vi,V2 of T which have a common 
successor v. Put Ci = \{vi), i = 1,2 and Ci^2 = A(u). Consequently, there is a 
literal £ such that £ G Ci and £ G C2- Hence removing v\ and V2 from T yields 
a literal-once resolution refutation T* with pre(T*) = [F \ {C'i,C2}) U {C'i_2}; 
pre(T*) G MU(1) by induction hypothesis. It follows now from (ii) that F G 
MU(1). □ 

In [4] it is shown that every minimal unsatisfiable Horn formula belongs to 
MU(1). Since every unsatisfiable Horn formula contains a minimal unsatisfiable 
Horn formula. Proposition 3 implies the following. 

Proposition 4 Every unsatisfiable Horn formula is refutable by literal-once res- 
olution. 

6 Proof of Theorem 1 

This section is devoted to a proof of Theorem 1. We reduce 3-SAT to recognition 
of LOR (in fact we could reduce SAT as well, but we choose 3-SAT to keep 
notation simpler). In a first step we reduce 3-SAT to the problem of finding 
a “satisfying path” in a digraph D, i.e., a path which does not run through 
prescribed pairs of vertices. In a second step we mimic this path problem by 
constructing a formula F such that literal-once resolution refutations of F and 
satisfying paths of D correspond to each other. 

First we prove two short lemmas which we will need below. 

Lemma 2 Let T be a literal-once resolution tree and C\,C2 G pre(T) with £ G 
Cl andl G C2 such that rlit(u) = {£,T\ for the root ofT. Then C'iriC'2 C con(T). 

Proof. Let v be the root of T and vi,V2 the predecessors of v. Consider £' G 
Cl n C2. Since T is literal-once, it follows that £' cannot be an element of both 
rlit(r^J and rlit(T^2). Hence £' G A(u) = con(T). □ 

Lemma 3 Let T = (V, A, A) be a literal-once resolution refutation and Ci, C2 G 
pre(T). Then there cannot be distinct literals £,£' G C\ such that £,£' G C2. 

Proof. We observe that there are vertices v,v\,V2 G V such that vi,V2 are pre- 
decessors of V and £ G rlit(w). W.l.o.g., assume £ G A(ui) and £ G A(v2). It follows 
that Cl G pre(T„J and C2 G pre(T„2). Since rlit(T„J fl rlit(T„2) = 0, t' is the only 
literal with £ G Ci and £ G C2. □ 
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Construction I. Let F3 = {Ci, . . . , C„} be a formula with Ci = ^1,3} 

for 1 < i < n. We write L for the set of literals £ such that var{£) G var(F 3 ). 
Further, for £ G L we put 

q{£) = {i \ £ G Ci, l<i<n}. 

Observe that i ^ q(£) for every £ G Ci, 1 < i < n, since clauses do not contain 
complementary pairs of literals. We assume w.l.o.g. that F 3 has no pure literals; 
i.e., |g(^)| > 1 for every £ G L. 

We construct a digraph D = {V, A) as follows. We take a set of n + 1 vertices 
{mq, . . . , Un}, and for i = 1, . . . , n we join Ui-\ and Ui by three (directed) paths 
P^^, Pi, 2 , Pi , 3 of length |g(^i,i)| + 1, |g(4,2)| + 1, \q{£i, 3 )\ + 1, respectively. We 
denote the set of inner vertices of Pij by Vij {1 < i < n, 1 < j < 3). Hence we 
have |Vij| = |9(^i,j)| for 0 < i < n, 1 < j < 3. Now we form a set S of pairs 
{v, v') of vertices v,v' GV\ {mq, . . . , Un} such that 

— there is a pair (v,v') G S with v G Vij and v' G Vi'j' (1 < i < i' < n, 
1 < J,j' < 3) if and only £ij = £i'j', and 

— every vertex in C \ {uq, . . . , Un} is contained in exactly one pair of S'. 

Note that such set S exists and can be obtained efficiently. We call a directed 
path in D satisfying if it runs from uq to and contains at most one vertex of 
each pair in S. Observe that each satisfying path has to pass through all of the 
vertices uq, . . . ,u„ in increasing order. 

Claim 1 F3 is satisfiable if and only if D has a satisfying path. 

Proof. If F3 is satisfied by some truth assignment t, then we can choose a{i) G 
{1, 2, 3} for 0 < i < n such that = 1. We observe that 

P Po,<7{0) ■ ■ ■ Pn,cr{n) ( 1 ) 

is a satisfying path. Conversely, by definition, every satisfying path P is of the 
form (1) for some a : {0, ...,n} — >■ {1,2,3}. Thus, if P is a satisfying path, 
then putting t{£i a-(i)) = 1 for 0 < i < n induces a truth assignment t which 
satisfies P 3 . □ 

Note that the above construction is closely related to the connection method 
(see, e.g., [13,3,10]). 



Construction II. Let D = {V,A) be the digraph obtained from a given 3- 
CNF formula F 3 according to Construction I. We consider a portion of distinct 
boolean variables: for 0 < i < n we take a new variable up, for each arc a G A we 
take a new variable Ua', for each pair p G S we take three distinct new variables 
PpjlpjSp- We define a formula F with 

var(F) = {uo, . . . ,u„}U { Oa \ a G A}U { f3p,jp,6p \ p G S } 



by 
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F = U F{v) 

v€V 

and the following definitions (recall that in(v) and out(t!) denote the sets of arcs 
incoming to and outgoing from v, respectively). For 0 < f < n let 

F{ui) = { {a^, Vi} \ a£ in(Mj) } U 

{ {ab,v~i} I b G out(wi) }. ^ 



For p = {v, v') G S with 

in(w) = {a}, out(u) = {6}, in(u') = {o'}, out(u') = {&'} (3) 



we put 



F(v) = {^a,/^p, 7 p}, and F(v') = {^a',/^p,7p}, 

}Pp^7p}^ }I^P'i7p}^ 

{_7pi^p}i {7p)*^p}) 

\_^h^7p^^p}} }^b' ^7p^ ^p}}^ 

and write F{p) = F{v) U F{v'). 

Claim 2 Let T = (V,A, A) be a literal-once resolution refutation of F and p = 
{v,v') G S. If F{p) n pre(T) yf 0 then either F{p) D pre(T) = F{v) or F{p) fl 
pre(T) = F{v'). 

Proof. Let a, a', b,b' £ A according to (3). We use the shorthands 



Cl — {cXajPpj7p}j 
Cl = {!3p,7p}, 

Cs = {Sp,7pj, 

C 4 = {cxh, 7pt bp}, 



C\ — }ota' , Pb,7p}j 

C'i = {ldp,7p}, 

C 3 = {5p,7p}, 

C 4 — }cx.b' ,7pi bp} 



so that F{v) = {Cl, . . . , C 4 } and F{v') = [C[, . . . , C}}. First we show 



{Ci,C(}^pre(T). 



(4) 



Suppose to the contrary that {Ci,C(} C pre(T). Consequently, there is some 
V £ V such that 7 p G rlit(u). Thus Ci,C( G pre(r„). By Lemma 2 it follows 
that [3p G A(u). Hence there must be a clause C £ pre(T) \ pre(T„) with /3p G C. 
By construction of F, C 2 and C 2 are the only clauses of F which contain ftp. 
Observe that jp G C 2 and ^ G C 2 . Thus 7 p G rlit(T), since con(T) = □. It 
follows that 7 p G rlit(T) \ rlit(T„). However, jp £ rlit(T„), and therefore we have 
a contradiction to the assumption T being literal-once. Whence (4) holds. By 
analogous arguments one can show 

{C 4 ,C}}^pre{T), 

{C2,Ca^pre(T), 

{C 3 ,Ca ^ pre(T). 



( 5 ) 
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We show that 

Cl G pre(T) C 2 G pre(T). ( 6 ) 

Assume Ci G pre(T). Since /3p G Ci, there must be a clause C G pre(T) with 
f3p G C; C 2 and C 2 are the only clauses of F which contain [3p. By Lemma 3 we 
conclude that C 2 ^ pre(T); thus C 2 G pre(T). Whence we have shown one direc- 
tion of (6). The converse can be shown similarly applying Lemma 3. Moreover, 
one can show by analogous arguments that 



C[ G pre(T) 
C 3 G pre(T) 
C^ G pre(T) 

Finally we observe that 

pre(T)n{Ci,C2,C',C;}^0 

Claim 2 now follows from (4)-(8). 



<t7 


C '2 G pre(T), 






C 4 G pre(T), 


(7) 




C 4 G pre(T). 




<t7 


pre(T)n{C(,C',C3,C4}7^0. 


( 8 ) 



□ 



Claim 3 D has a satisfying path if and only if F G LOR. 

Proof. Assume that D has a satisfying path P. We denote by V (P) and A(P) 
the vertices and arcs of P, respectively. For 0 < z < n we put 

Fp(ui) = { {c^, zzj I a G in(zti) O A{P) } U 
{ {ab,Pi} I b G out(zzj) n A{P) } 

and for u G R (P) \ {mq, ■ . ■ , zz„} we put Fp{v) = F{v). We show that 

= {{^'o},{Fif}}U y Fp{v) 
vev{P) 

can be refuted by literal-once resolution (observe that F{P) C F). Consider a 
vertex v G V{P) with p = {v,v') G S. Using the same notation as in the proof 
of Claim 2, we have F{v) = {Ci, C2, C3, C4} C F{P). Now Ci,2 = {cpf, 7 ^} is 
a resolvent of Ci and C2; C37 = {ab,%} is a resolvent of C3 and C4. Further, 
Cv = {qU, ctb} is a resolvent of Ci^2 and C37. Hence finding a literal-once resolu- 
tion refutation of F{P) reduces to finding a literal-once resolution refutation of 
(F{P) \ F{v)) U {{oif, Of,}}. Similarly, if v' G V{P) with p = {v,v') G S', then it 
suffices to find a literal-once resolution refutation of (F(P)\F(w'))U{{aU, ctb'}}- 
By multiple applications of this argument, F{P) can be reduced to a formula of 
the form 

A'lin = {{^1}, {^1)^2}, {^2,^3}, ■ • • J {b!r-l,l!r}, {^r}}- 

It is easy to construct a literal-once resolution refutation Tii„ for Fii„. Now 
Tiin can be extended by the above considerations to a literal-once resolution 
refutation of F{P). Whence F G LOR follows. 

Conversely, assume that F G LOR. We show that D has a satisfying path. 
Let T be a literal-once resolution refutation of F and put F' = pre(T). Let W 
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be the set of vertices w & W such that there is at least one arc a G in(w) Uout('u;) 
with Ua G var(F'). Clearly W Since F' has no pure literals, it follows that 
for every ru G hh \ {uQ,Un} there are arcs a G in('u;), b G out(w) such that 
aa,ab G var(F') (if w = Ui for some 1 < i < n — 1 this is obvious; on the 
other hand, if w belongs to some pair in S, then it follows by Claim 2). Thus, 
for every w £ W, at least one predecessor and at least one successor belongs to 
W. Consider the subdigraph of D induced by W. Clearly Dyy is acyclic, 
since D is acyclic by construction. Every nonempty acyclic digraph has at least 
one vertex s without incoming arcs and at least one vertex t without outgoing 
arcs. For Dw the only possibility is s = mq and t = Un- We conclude that Dw 
contains a path from uq to By Claim 2 it follows that for every (v,v') G S 
at most one of v, v' belongs to W. Thus P must be a satisfying path necessarily. 
This completes the proof of the claim. □ 

In view of Lemma 1, Theorem 1 now follows from Claims 1, 3, and the 
NP-completeness of the 3-SAT problem. 



Appendix: Technical Lemmas 

Lemma A F £ LOR implies F° G LOR for every formula F. 

Proof. We show by induction on \V\ that for every literal-once resolution tree 
T = {V,A,X) there is a literal-once resolution tree T' with pre(T') = pre(T)°, 
con(T') = con(r)°, and rlit(T') = { | x G var(rlit(T)), i = 1,2 }. 

If \V\ = 1, then there is nothing to show. Assume \V\ > 1 and let v be the 
root of T and x the variable in rlit(u). Moreover, let Ui,V 2 the predecessors of 
V such that x G A(ui) and x G \{v 2 ). For i = 1,2 let T- be a literal-once 
resolution tree obtained from as supplied by the induction hypothesis. Since 
rlit(r^Jnrlit(r„ 2 ) = 0, it follows that rlit(T{)nrlit(T 2 ) = 0. Now x[i] G con(T/) = 
con(T„J°. It is obvious how T[ and can be assembled to literal-once resolution 
tree T' with the desired properties by adding two non-leaves and a leaf w with 
A(w) = {x[l],x[2]}. □ 

The following Lemma is due to an observation by Kullmann. 

Lemma 5 F° £ ROR implies F° £ LOR for every formula F. 

Proof. Observe that for every resolution tree T = {V, A, A) and two distinct 
vertices v,v' £ V with rlit(u) = rlit(u') = {x,x}, there must be at least four 
distinct leaves ui,U 2 ,u[,U 2 £ V such that x G A(wi) O A(u 2 ) and x G X{u 2 ) O 
A(w 2 ). (Every vertex v £V with rlit(u) = {x,x} “consumes” at least one leaf u\ 
with X £ A(wi) and one leaf U 2 with x £ X(u 2 ).) However, for every variable x[i] 
of F° there is exactly one clause C £ F° such that x[i] £ C. Hence every read- 
once resolution refutation of F° is literal-once. □ 
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Lemma 6 Let F be a formula and T a resolution refutation with pre(T) C F° . 
Then pre(T) = Ff for some Fi C F. 

Proof. Follows from the fact that pre(T) has no pure literals. □ 

Lemma 7 F° G LOR implies F G LOR for every formula F. 

Proof. We show by induction on |F| that if T is a literal-once resolution refuta- 
tion with pre(T) = F°, then there is a literal-once resolution refutation T' with 
pre(T') = F; the lemma will follow by Lemma 6. If |F| = 1, then F = F° = {□}, 
and the result follows by taking T' = T. Now assume |F| > 1 and let T be a 
literal-once resolution refutation with pre(T) = F° . We call a vertex v' of T 
mistimed if there is a predecessor vi of v' with A(t'i) = a;[2]}, x G var(F), 

and a successor v of w' such that rlit(ti) O x[2]} = 0. Mistimed vertices can 
be successively eliminated as follows (roughly speaking, we shift leaves labeled 
by clauses of the form {x[l], x[2]} towards the root). Consider a mistimed vertex 
v' of T with predecessors Vi and V2 such that A(w2) = {x[l], x[2]}, x € var(F). 
Let V be the successor of v' such that v' and v" are the predecessors of v. We 
remove the arcs (vi,v') and {v" ,v) from T and add instead the arcs (vi,v) and 
{v",v'). Clearly \{v') and \{v) can be modified appropriately such that the re- 
sult is still a read-once resolution refutation with same set of premises. Hence 
we can assume, w.l.o.g., that T has no mistimed vertices. 

We write Li for the set of leaves f of T with A(w) = C° for some C G F, 
and we write L2 for the set of leaves of T not in Li (i.e., if w G L2, then 
\{v) = {x[l],x[2]} for some x G var(F)). Observe that for any two leaves Vi,V2 
of T which have the same successor, either vi G L\ and V2 G L2, or vice versa. 
Therefore, if T is nontrivial, then the height of T (i.e., the length of a longest 
path in T) is at least 2. 

We choose a vertex v of T such that Ty has height 2. Since T has no mistimed 
vertices by assumption, we conclude that exactly one leaf of Ty is in Li. Hence v 
has two predecessors v' and v" such that v' has two predecessors vi G Li and V2 G 
L2, and v" G Li. Let Q,R G F and x G var(F) such that A(v2) = {a;[l], a;[2]}, 
A(wi) = Q°, and A(w") = R°. It follows for {i,j} = {1,2} that x[i] G rlit(ri') 
and x\j] G rlit(ti). Observe that x[i] ^ var(i?°); otherwise there would be a leaf 

yf V2 with A(v 2) = A(v2). We conclude that a;[l],a;[2] ^ var(A(w)). Thus Q and 
R have a resolvent C with A(t!) = C°. Let Tg be the resolution tree obtained 
from T by removing vi,V2,v', v" . We have 

pre(To) = (pre(T) \ |{a:[l], a^[2]|, g°, R°|) U |C°}. 

Clearly Tq is literal-once, hence the induction hypothesis applies. Thus, there 
is a literal-once resolution refutation Tq with pre(Tg)° = pre(To); in particular, 
C G pre(Tg). Let w be the leaf of Tg labeled by C. It is now obvious how a 
literal-once resolution refutation T' = (R', H', A') can be obtained from Tg. we 
add two vertices rui, W2, the arcs (wi,w), (w2,w) to Tq, and we put A'(wi) = Q 
and A'(w2) = R. Hence the lemma follows. □ 
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Abstract. Connection graph resolution (cg-resolution) was introduced 
by Kowalski as a means of restricting the search space of resolution. Sev- 
eral researchers expected unrestricted connection graph (eg) resolution 
to be strongly complete until Eisinger proved that it was not. In this pa- 
per, ordered resolution is shown to be a special case of cg-resolution, and 
that relationship is used to prove that ordered cg-resolution is strongly 
complete. On the other hand, ordered resolution provides little insight 
about completeness of first order cg-resolution and little about the estab- 
lishment of strong completeness from completeness. A first order version 
of Eisinger’s cyclic example is presented, illustrating the difficulties with 
first order eg resolution. But resolution with selection functions does yield 
a simple proof of strong cg-completeness for the unit-refutable class. 



1 Introduction 

Connection graph resolution (cg-resolution) was introduced by Kowalski 
in 1975 [10] as a means of restricting the search space of resolution. Proving com- 
pleteness turned out to be non-trivial, the first proof appearing in Bibel’s sem- 
inal paper [3] in 1981. Several researchers expected cg-resolution to be strongly 
complete — any sequence of eg resolution steps applied to a propositional for- 
mula would eventually terminate — because it is a destructive calculus: Links 
are deleted upon activation. More than one author^ tried to prove strong com- 
pleteness of cg-resolution until Eisinger’s famous example [7] dispelled that idea. 
He showed that even a quite restrictive notion of fairness cannot prevent cyclic 
derivations. 

An example of a strongly complete calculus is path dissolution [12], which 
operates by deleting all paths (and thus the link itself) through a given link, 
decreasing the number of paths in the formula. Since there are finitely many 
paths, the process terminates in a linkless formula. That formula is empty if 
the original formula was unsatisfiable; otherwise the remaining paths are mod- 
els of the original formula. What is interesting for the discussion here is that 

^ The intersection of that club with the authors of this paper is non-empty. 
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dissolution has a metric — the number of paths — that is reduced each time a 
link is activated. Such behavior guarantees termination regardless of the order 
in which links are selected. Any destructive calculus might exhibit this behavior, 
i.e., might reduce some metric, guaranteeing termination. This is what many re- 
searchers hoped would be true for cg-resolution. There is no obvious metric with 
that property for cg-resolution: While links are deleted, the number of links does 
not in general decrease because additional links are inherited. Nonetheless, there 
might have been some metric that did decrease — perhaps the number of links 
would eventually decrease monotonically. Of course, Eisinger’s work dispelled 
that notion as well. 

It should be noted that any complete calculus together with breadth first 
search is strongly complete. To be interesting, strong completeness should arise 
from some intrinsic property of the calculus itself. Moreover, it should not re- 
quire a search that is so broad as to virtually guarantee long derivations. For 
example, Bibel’s original proof of cg-resolution completeness activated every link 
associated with one atom, and then, when every occurrence of that atom was 
unlinked (due to link deletions), links to the next atom are activated. Adding 
that link selection strategy to cg-resolution does make it strongly complete, but 
not in a very interesting way because it forces a large search space. 

In [3] Bibel ordered literals (and, in effect, links) and used the ordering to 
prove that cg-resolution is complete. The key to his proof is that selecting and 
activating links based on the ordering (along with judicial use of tautology dele- 
tion) preserves spanning if the initial connection graph has the full set of links. 
It is the initial full set of links, and in turn tautology deletion, that enables not 
only the completeness result but the observation that the technique is in fact 
strongly complete, albeit not in a very interesting way, as noted above. 

Since the desired result — that unrestricted cg-resolution is strongly com- 
plete — simply is not true, one might ask the question, is there a restriction on 
cg-resolution that is strongly complete? One approach is to look for the weakest 
possible restriction with the goal of obtaining the most general result. A second 
approach is to build up from Bibel’s ordering, which is the approach adopted 
in this paper. That is, an ordered strategy for link selection that is more gen- 
eral (and more interesting from a strong completeness point of view) than the 
one Bibel employed is shown to be strongly complete. It turns out that ordered 
resolution is a special case of cg-resolution, so the strong completeness result 
presented in Section 2 is not really a new result. But the proofs seem to be sub- 
stantially simplified, and some of the theorems have apparently never appeared 
in print. 

On the other hand, ordered resolution provides little insight about complete- 
ness of first order cg-resolution and little about the establishment of strong 
completeness from completeness. A first order version of Eisinger’s cyclic exam- 
ple is presented, illustrating the difficulties with first order cg-resolution. The 
problem in each case is that liftable literal orderings are not total on the first 
order level. 
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However, cg-resolution restricted to unit-refutable clause sets is strongly com- 
plete. There is, a simple proof that employs resolution with selection functions; 
see section 4.2. 



2 Preliminaries 



A multiset over a set L is a mapping M from L to the non-negative integers; M 
is finite if M{1) = 0 for all but finitely many I € L. Conceptually, a multiset is 
a collection of any number (including zero) of occurrences of the elements of L. 
The set L may be treated as a multiset by letting M{1) = 1 for all Z € L. If ^ is 
a strict partial order (i.e., an irreflexive, transitive relation) on L, then ^ can be 
extended to multisets over L. If M and M' are distinct multisets over L, then 
M' -< M \i whenever there is an Hn L with M{1) < M'{1), there is an I' in L with 
I -< I' and M'(l') -< M{1'). In other words, to diminish a multiset, remove one 
occurrence of an element and replace it with any finite number of occurrences of 
smaller elements. Well-foundedness and totality of -< are inherited by a multiset 
extension; a good source for this material is [6]. 

Let A be a countable propositional signature (i.e., atom set). A literal is 
either an atom or a symbol of the form —p, where p is an atom, and a clause 
is a finite set of literals. Hence, since a clause may be treated as a multiset of 
literals, any order on literals defines an order on clauses. If Z is a literal then —I 
denotes the literal —I if I G S, and it denotes p if I = —p for some p G S. As 
usual, □ denotes the empty clause. 

A link in a finite set of clauses S' is a set {I, —1}, where the complementary 
literals I and —I occur in distinct clauses of S. Links are ordered as sets if 
literals are ordered, and this ordering can be extended to multisets of links. An 
occurrence of the literal I in the clause C may be written Ic, and {Zci~^r>} 
denotes a link, where I G C, —I G D. A link {Ic, —Id} is said to be ordered if Ic 
is the maximum literal in C (with respect to ^), and Id is the maximum literal 
in D. All orderings are well-founded because the number of distinct clauses and 
links is finite. 

To define connection graph resolution, which is the focus of this paper, we first 
define ordinary resolution. Let S' be a set of clauses, and let {Ic, —Id} be a link in 
S. Then the clause E = (C— {Zc'})U(I?— {— Z d}) is the resolvent of S with respect 
to the link {Ic, —Id}- The clause E is said to be an ordered resolvent if {Iq, —^d} 
is an ordered link, and the resolution procedure that requires every resolvent be 
ordered is called ordered resolution. Typically, one additional assumption is made 
for ordered resolution: For each atom p, —p is the immediate successor of p. 

The diagram below illustrates resolution. 



C 




{Ic, -Id} 







D 



E 
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A resolution derivation of a clause E from a clause set 5 is a finite sequence 
of clauses Hi, H 2 , ■ ■ ■ , = E such that Di is a resolvent of S' U {Hi, . . . , Hi_i} 

for all 1 < * < n. Such a derivation may also be viewed as the sequence of clause 
sets S, S U {Hi}, S U {Hi, H 2 }, . . .. An ordered resolution derivation is one in 
which each resolution step is ordered. 

Connection graph resolution is essentially ordinary resolution with link dele- 
tion. As a result, it is necessary to keep track of the links that are present. Thus, 
a connection graph (c-graph) is defined to be a pair (S, £), where S is a finite 
set of clauses, and £ is a set of links in S. As usual, □ denotes any c-graph 
whose clause set contains the empty clause.^ The connection graph containing 
all possible links is called the full graph of the clause set S. 

The notions of path and spanning are key to cg-resolution. A (conjunctive) 
path through a set of clauses S' is a set containing exactly one literal from each 
clause. A c-graph G = (S, C) is said to be spanned by C if each path through S 
contains a link from C. Observe that a spanned formula is unsatisfiable, and a 
formula is unsatisfiable if and only if it is spanned by the full set of links. 

To define connection graph resolution, let G = (S, C) be a connection graph, 
let L = {Ic, —Id} be a link in C, and let E be the clause obtained by resolving 
on L. Then connection graph resolution on the link L produces the connection 
graph G' = (S',C'), where S' = S U {H|, and 

£' = (£ — {£}) U {{I'e, —I'f} \ ^' I- 3’Hd {{I'c, —I'p} € £ or {I'jj, —I'p} £ £)} . 

The clause E is called a cg-resolvent. If G = Go, Gi, . . . , G„ = H is a sequence 
of c-graphs such that each Gi is produced from Gi_i by cg-resolution, then H 
is said to be obtained from G by a cg-resolution derivation. The diagram below 
illustrates cg-resolution. 

deleted 




•<— inherited links 



Let us emphasize that what distinguishes cg-resolution from ordinary resolu- 
tion is, given a c-graph G = (S, £), only the links in £ may be cg-resolved upon. 
A key property of cg-resolution is that it preserves spanning — see Lemma 1 
below. 

A literal occurrence I in a c-graph G = (S', £) is called pure if I is unlinked; 
i.e., if no link in £ contains the occurrence 1. The clause G is pure it it contains a 
pure literal. A clause that contains complementary literals is called a tautology. 
Pure clauses and tautologies are of interest because of the pure rule and the 
TAUT lemma, which delete clauses under some circumstances. The next three 
lemmas are from Bibel’s 1981 paper [3]. 



^ In the presence of a subsumption rule, this graph can be assumed to be ({□},0). 
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Lemma 1 {Spanning Lemma). If a link present in a spanned connection 
graph is cg-resolved upon, then the resulting connection graph is spanned. □ 



Lemma 2 {Pure Rule). If G is a spanned connection graph, and if a pure 
clause together with the associated links are deleted from G, then the resulting 
connection graph is spanned. □ 



Lemma 3 {TAUT Lemma). If G = {S,C) is a spanned connection graph, 
and if G G S' is a tautology containing the literals I and —I, and if G together 
with the associated links are deleted from G, then the resulting connection graph 
is spanned provided that the following condition is met. 

Wd, -Ie} I {{Id,-Ic}, {h, -^e}} ^ ^ ■ 



□ 



Propositional logic is decidable, so one expects a proof procedure to be able 
to determine both satisfiability and unsatisfiability for a ground clause set. In 
the following, affirmation properties deal with satisfiable clause sets, refutation 
properties with unsatisfiable clause sets. Some additional terminology will be 
useful: A clause set or c-graph is saturated with respect to (standard or ordered 
or eg-) resolution iff no new resolvent can be generated from it.^ Formally, a proof 
procedure is refutation complete if whenever S is unsatisfiable, □ can be derived 
from S using the proof procedure. It is affirmation complete if satisfiability of S 
implies that a saturated clause set (saturated connection graph) can be derived 
from S (the full graph of S). 

Strong completeness adds to completeness the requirement that some con- 
crete and deterministic procedure actually refutes (or affirms) a given clause set 
after a finite number of resolution steps. In resolution theorem proving the task 
of such a procedure is to determine the next link to be resolved away. 

There are several possibilities for what is meant by a “concrete and deter- 
ministic procedure.” An obvious but trivial way to obtain strong completeness 
is simply to make a breadth first search for all possible derivations. In that case 
there is essentially no distinction between strong and standard completeness. 
Similarly, one can enumerate all possible derivations via backtracking. In prac- 
tice, one is interested in depth first, backtracking-free proof procedures. More- 
over, selecting the link for the next resolution step should be reasonably cheap, 
typically polynomial in the size of the given clause set or c-graph. 

A link selection rule is said to be local if it selects links only from the last 
clause set or c-graph in a derivation. This property seems to be implicitly as- 
sumed in much of the resolution literature, and in standard resolution systems 



^ With the pure rule, a saturated c-graph must then be (0, 0). 
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it often comes cheap — i.e., almost automatically. If, for example, a new clause 
set is obtained always by adding a resolvent to the previous clause set, then 
any selection rule is essentially local. Sometimes, however, a selection rule is 
desired in which the behavior is dependent upon the the derivation history and 
the point in that history when clauses were introduced. A classic example is the 
level saturation strategy of Wos et. al. [20]. It can easily be implemented if one 
stamps each clause with its level. Clauses in the input set S have level 0; if E 
is a resolvent of C and D, then level(if) = max{level(C), level(I?)} + 1. Then 
a strongly complete, local selection rule is obtained by choosing a link whose 
resolvent has minimal level and is not already present. 

Even when destructive operations such as subsumption deletion are intro- 
duced, selection rules have typically been assumed to be local. The consequences 
for completeness, strong or otherwise, can be subtle. Removal of clause B when 
it is subsumed by C from a set containing both is obviously unsatisfiability 
preserving. This was often assumed to be essentially completeness preserving. 
Sibert [15] was the first to point out that this semantic observation, combined 
with completeness of resolution, guaranteed only that a refutation was avail- 
able at any point, provided that only resolution was employed from that point 
onward. These issues were subsequently investigated in great detail — see [11]. 
Of course, an expensive global selection rule that made use of previous clause 
sets containing subsumed clauses was simply never contemplated (as far as we 
know) . 

A local selection rule is called a filter in the connection graph literature (an 
excellent reference is [7]^). As with ordinary resolution combined with subsump- 
tion, cg-resolution is destructive. This means that a deduction step not only 
derives new information but alters the present state by removing information 
about the previous state. In particular, any clause and link may vanish during 
a cg-derivation. Not surprisingly, reconciling completeness issues with the de- 
structive nature of the calculus has proved to be considerably more elusive for 
cg-resolution than for the more standard resolution systems. Other examples of 
destructive calculi are free variable tableaux [2] and dissolution [12]. 

Level saturation is an instance of what is often called a fair selection rule. 
In cg-resolution a fair rule guarantees that each link in each derivation vanishes 
within a finite amount of time (called a coveringthree filter by Siekmann and 
Stephan [16]). It can easily be implemented locally by an implicit or explicit 
queue and, in non-destructive calculi (for example, standard resolution), implies 
strong completeness. In destructive calculi, however, it is not obvious that loops 
cannot occur, and, as Eisinger showed — see Theorem 1 below — this can 
in fact happen with cg-resolution. Concern for such issues led to consideration 
of more restrictive fairness conditions. Siekmann and Stephan color each link 
differently in the initial graph and colors are inherited. Then, in addition to 
the fairness condition described above, in each derivation state each color must 

There, locality is not explicitly enforced, but seems to be intended, because a loop 
check on the derivation history for isomorphic c-graphs would deal with the cyclic 
counter example. 
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be resolved upon within a finite number of steps (this was called a coveringtwo 
filter). Eisinger’s example [7] shows that even this version of fairness is not 
enough. 

Theorem 1. For ground cg-resolution there are fair selection rules with the 
coloring restriction that are not strongly refutation complete. □ 



Ordered resolution is one refinement of resolution that is strongly complete. 
Let ^ be a total, well-founded ordering of literals with the property that the 
complement of a literal is the literal’s immediate predecessor or successor. Then 
the extension of -< to clauses is well-founded and total, and the resolvent of any 
ordered resolution step is smaller than each of its parents. By well-foundedness, 
there are only finitely many smaller clauses than the greatest clause in S. This 
observation ensures that ordered resolution terminates. In Section 3), termina- 
tion is combined with completeness to obtain strong completeness. Kowalski 
and Hayes [9] used a very different approach to show that ordered resolution is 
complete: a semantic tree argument. 



3 Strong Completeness 

We begin this section by showing that ordered resolution is a special case of cg- 
resolution.® The key is Lemma 5, which says, in effect, that any total ordering 
of literals can be used as a guide for link selection with cg-resolution. It is very 
easy to see that ordered links are present in any spanned connection graph since 
the path containing the maximal element from each clause must contain a link. 
The lemma says that every ordered link not yet activated must be present. 

First, we point out that ground clauses become pure only because of links 
deleted after activation. 

Lemma 4. A ground clause may become pure only from a cg-resolution step 
activating the sole link to one of its literals. 

Proof: Given clauses C = {pc}UA and D = and link L = {pc,Pd}, 

consider a cg-resolution step activating L, producing the cg-resolvent E = AiJB. 
Clauses C and D were not pure prior to activating L. If L is the only link to pc 
or to pjj, then one or both parents may become pure and be deleted. All links 
to the deleted clause(s) will also be deleted. 

To see that not other clause has become pure, consider links to literals in A 
or in B. All such links are inherited in the cg-resolvent. As a result, any clause 
linked to a deleted parent has gained exactly one link for each link removed due 
to deletion of the parent. Now consider links (other than L) to pc- If there are 
any such, then C did not become pure, and those links remain. If there are none, 
then none (other than L) were removed. The same argument applies to links to 
Pd- n 

® This observation was made independently by Harald Ganzinger. 
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Lemma 4 essentially states that “cascading purity” is not possible at the 
ground level. Deletion of a pure clause and its links happens only when that 
clause is the parent of a resolvent and cannot cause another clause to become 
pure. 

Lemma 5. Let G be a connection graph with the full set of links, and suppose 
that the literals of G have a total ordering. Suppose further that {S,C) is ob- 
tained by a sequence of ordered cg-resolution steps. Then every ordered link in 
S must either be in C or have been directly activated. 

Proof. Suppose to the contrary that the ordered link {Z^;, — Zp} in S' is not present 
in C and has not been activated. Since the initial graph G was a full graph, this 
link must have been deleted in an earlier cg-resolution step. Hence at least one 
of the two clauses E,F is a resolvent, say E = {1} U E' is the resolvent of C 
and D. Then Z cannot be maximal in its parent, nor can any occurrence of it be 
maximal in any of its ancestors. Thus, the link {Z^:, —Ip} can never have been 
activated and hence could never have been deleted. □ 



One way to interpret the lemma is that for ordered resolution, link deletions 
are no more than a convenient means of preventing links from being activated 
more than once. Specifically, we have 

Theorem 2. Let G be any set of clauses with the full set of links. If the literals 
of G are totally ordered, then, with respect to that ordering, ordered resolution 
and ordered cg-resolution are identical. □ 



We will demonstrate the (affirmation and refutation) strong completeness of 
ordered resolution by proving two theorems. First, as long as no link is activated 
more than once and tautologies are not used as parent clauses, any sequence 
of ordered resolutions must eventually terminate, producing a saturated clause 
set. Secondly, if a formula is unsatisfiable, the saturated clause set contains the 
empty clause. 

We assume the atom set is totally ordered, and that the literal set has been 
ordered by making —p the immediate successor of p. That ordering (and its 
extension to links and clauses) will be referred to as the atom ordering. Observe 
that the clause ordering is total and well founded. Observe also that under this 
ordering, an ordered resolvent precedes both of its parents (as long as neither 
parent is a tautology). We use head of a clause to denote the largest literal in 
the clause. 

Theorem 3. If G is a clause set, and if the literals of G have the atom order- 
ing, then ordered resolution must terminate; that is, any sequence of ordered 
resolution steps in which no link is activated more than once and tautologies are 
not used as parent clauses will eventually produce a clause set with the property 
that every ordered link has been activated once. 
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Proof. Let G be a set of clauses, and let A = {pi,p 2 , ■■■,Pn} be the atom set of 
G, where pi -< pj if z < j. No ordered resolvent that does not use a tautology 
can ever produce a new link, so the number of Pn links is fixed, say there are 
NnPn links. New (ordered) Pn-i links can be produced by activations of (some 
of) those NnPn links. Thus, the number of (ordered) Pn-i links that can ever be 
produced is fixed, regardless of the order in which links are activated. Similarly, 
new Pn -2 links can be produced only if or Pn-i links are activated. Since only 
finitely many such activations are possible, only finitely many new Pn -2 links 
can be produced. Continuing inductively, only finitely many ordered links can 
ever exist, so only finitely many ordered cg-resol vents are possible. We emphasize 
that the order in which links are activated has no impact on this analysis, and 
the proof is complete. □ 



The proof method for the next theorem® is due to Bachmair and 
Ganzinger [1]; the proof presented below is a refinement of an elegant expo- 
sition by Paliath Narendran. Interpretations in the proof will be described as a 
set of atoms, indicating that atoms in the set are assigned true and atoms not 
in the set are assigned false. 

Theorem 4. Let S be an unsatisfiable set of clauses that is saturated with 
respect to ordered resolution. Then S contains the empty clause. 

Proof. Suppose to the contrary that S does not contain the empty clause. Let 
S = {Gi, G 2 , ..., Gfe} where Ci ^ Cj if * < j. For each G^, iteratively define an 
interpretation 7^, where /q = 0, and the interpretation f is constructed from Ii_i 
as follows: li = li-i if the maximal literal in Ci is negative or if satisfies Ci. 
Otherwise, C = li-i U {g} where q is the maximal literal in Ci. In that case, Ci 
is said to produce q. Let I = Ik. 

Now, since S is unsatisfiable, there is a first clause Ci that is falsified by I. 
Then the head of Ci cannot be positive, since if it were and since I falsifies the 
other literals in Ci, I would assign true to the head. Let —p be the head of Ci. 
Then p must occur in some earlier clause Cj that produced p. All other literals 
in Cj are false since otherwise Cj would not have produced p. This implies that 
neither Cj nor Ci is a tautology; in particular, every non-head literal in both 
clauses precedes p and is different from p and is falsified by I. Hence {pCj ) —PCi } 
is an ordered link. Resolving produces a clause G that is falsified by I. Moreover, 
G precedes both Cj and G,, contradicting the fact that Ci is the first clause 
falsified by I. □ 



Theorem 4 is known but is based on the assumption of saturation. In combi- 
nation with Theorem 3, it can be seen that for ordered cg-resolution, saturation 
criteria need never be tested: The supply of ordered links is simply guaranteed 
to run out, regardless of the order in which they are selected. 

® Ordered resolution is attributed to J. Reynolds [13]. Variations have been proven 
complete by Kowalski and Hayes [9] and by Joyner [8]. 
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4 First Order Issues 

In the first order case, each literal in a resolvent must of course be instantiated 
with the most general unifier (mgu) of the activated link. One side effect could 
be that the literals of “inherited links” are no longer unifiable, in which case 
there is no inherited link. Even when these literals are unifiable, the mgu may 
be incompatible with other links in the resolvent — see [10]. Such incompatibility 
enables deletion of the link. The point is that the situation is quite different from 
that described by Lemma 4: Literals other than those resolved on may become 
pure, resulting in cascaded pure clauses (see, for example, [7, p. 4]). 

Consider the example in Figure 1. The variables are x,y, z,f,and w. 



( 1 ) 

( 2 ) 

( 3 ) 

( 4 ) 

( 5 ) 



P{f{x),b) 



{f(x)/y, b/z}/^ 




{a/w} 



Fig. 1. Cascading purity at the first order level. 



Activating the dashed link produces (5), and (2) becomes pure. Removing (2) 
does not immediately purify (3) because of the inherited link to the resolvent. 
But the mgu for this link binds w to b, whereas the mgu for the other link to (3) 
binds w to a; due to this incompatibility, either link may be deleted, rendering 
(3) pure. Note also that if the negated literal in (3) were q{v, a), the link to (2) 
would not have been inherited at all. 

This makes lifting with purity difficult, but as will be seen, lifting is a big 
problem in cg-resolution even without purity. 

First we observe that completeness for standard resolution without self- 
resolution (both parent clauses are variants of the same clause) is not open as 
indicated in [7, p. 123]. To see this, it is sufficient to consider Pl-resolution[14]. 
Since one parent is positive and one is negative (later called “mixed”) in each 
resolution step, they cannot be lifted to the same clause. This renders unneces- 
sary some but not all applications of the “copy rule” (in which a new variant of 
a clause together with suitable link occurrences is generated) . 

Unfortunately, ordered resolution does not substantially improve Eisinger’s 
copying strategy. The reason is that the total ordering on literals required in 
Theorems 2 and 3 above, does not lift to the first order level. These results hinge 
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on the fact that there is exactly one maximal literal per clause, which is not 
the case on the first order level (see the following section). To see that literal 
orders do not work, the example^ of [7, p. 125] suffices with the ground atom 
order, where Pab > Pba > Paa > Pbb. Hence, the lifting problems identified by 
Eisinger apply here as well; the only known remedy is to copy enough clauses 
and links to ensure that lifting works. 

4.1 Strong Completeness 

Judicious use of the copy rule can guarantee completeness. Even so, ordered 
resolution does not help prove first order strong completeness. An ordered reso- 
lution proof on the first order level is in general not a cg-resolution proof. The 
culprit here is the same as in the previous section: One cannot totally order first 
order literals and preserve the lifting property {I < V implies la < I'a for all 
substitutions a) required for ordered resolution; for example, unifiable literals 
cannot be ordered. 

In fact, when enough unifiable literals are present in a clause set, ordered 
resolution gives no guidance at all and boils down to standard resolution. This 
can be demonstrated with a first order version of Eisinger’s cyclic example. 

Example 1. Eisinger’s cyclic derivation is based on the imsatisfiable ground c- 
graph displayed on the right. 

Consider the first order c-graph obtained 
by replacing each positive literal P with 
m{xp,c^), each negative literal —P with 
m{xp,c]^) and adding the clauses —m{xp,c^)\/ 

— m{xp,Cp) for all atoms P. The Xp are vari- 
ables and the C+ , c~ are constants. In the re- 
sulting c-graph: 

— There is an unsatisfiable clause set, 

— all literals in both the graph and all of its resolvents are maximal wrt any 
literal order,® 

— and each ground cg-resolution step from Eisinger’s example can be simulated 
by two admissible ordered resolution steps on the first order level involving 
one of the additional clauses. 

Hence no literal order can exclude an “Eisinger-cycle” . This destroys all hope 
for achieving strong completeness through orders. 

Just as in the propositional case, the root of the problem is in the destructive 
nature of the calculus: In standard first order resolution, a clause is unaffected 

^ The example is too lengthy to be reproduced here. 

® Depending on the order, some polarities of literals may have to be adjusted or 
arguments of predicates exchanged; the present transformation works for orders in 
which negative literals are larger than positive ones and the first argument has 
precedence over the second one. 
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when it becomes the parent of a resolvent, whereas in cg-resolution one of its 
links is destroyed. This lifts only from ground proofs in which every instance of 
the activated link is deleted. On the other hand, the copy rule can be restricted 
to clauses that contain a literal which might resolve with more than one other 
clause. 

An alternative to lifting would be to find a termination order that works 
directly on first order connection graphs. However, this seems more difficult than 
for the ground case (see Section 3), in part because the factoring rule copies links 
excessively. 

4.2 Resolution with Selection Functions 

Because total ground orderings become partial when extended to the first order 
case, ordered resolution is too weak a restriction for approximating first order 
cg-resolution. A resolution refinement which does not have this drawback is 
resolution with seleetion funetions. A selection function / selects exactly one 
literal per clause and must be liftable. This works fine, because the proof of 
Theorem 2 can be lifted and literal selection is not ambiguous even for first 
order clauses. There is a drawback, of course: Resolution with arbitrary selection 
functions is incomplete in general. 

On the other hand, resolution with selection functions is complete when no 
factoring is required®, for example, for Horn or unit refutable clause sets. Strong 
completeness (without copying) for these classes was already known by [7], but 
at least the proof via selection functions is much simpler. 



Acknowledgement. Bernhard Beckert noticed an error in an earlier version 
and gave several useful and simplifying suggestions. A discussion with Harald 
Ganzinger helped clarify several issues. 
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Abstract. We give a proof of refutational completeness for Extended 
Narrowing And Resolution (ENAR), a calculus introduced by Dowek, 
Hardin and Kirchner in the context of Theorem Proving Modulo. ENAR 
integrates narrowing with respect to a set of rewrite rules on proposi- 
tions into automated first-order theorem proving by resolution. Our proof 
allows to impose ordering restrictions on ENAR and provides general re- 
dundancy criteria, which are crucial for finding nontrivial proofs. On the 
other hand, it requires confluence and termination of the rewrite system, 
and in addition the existence of a well-founded ordering on propositions 
that is compatible with rewriting, compatible with ground inferences, to- 
tal on ground clauses, and has some additional technical properties. Such 
orderings exist for hierarchical definitions of predicates. As an example 
we provide such an ordering for a fragment of set theory. 



1 Introduction 

Dowek, Hardin and Kirchner [6] introduce Theorem Proving Modulo and in that 
context the calculus Extended Narrowing And Resolution (ENAR). They show 
completeness of ENAR by transforming proofs in a sequent calculus modulo a 
congruence on formulas into ENAR proofs with respect to the same congruence 
represented by a term rewriting system, using cut elimination for the sequent 
calculus in the process. Dowek and Werner [7] show the cut elimination property 
for the cases of HOL-Acr, quantifier-free theories and positive theories. 

Here we give an alternate completeness proof based on the reduction-of- 
counterexamples method developed over recent years [3]. This allows to impose 
ordering restrictions on the calculus and provides a strong notion of redundancy, 
which is crucial for solving larger problems. The proof requires a well-founded 
ordering on propositions with certain properties such as compatibility with the 
rewrite relation. Such orderings exist for hierarchical definitions of predicates. 
As an example we define such an ordering for a small fragment of set theory. 

From the viewpoint of automated theorem proving it is interesting to study 
how the technique for proving refutational completeness can be extended to 
handle skolemization or even quantifiers in formulas. Since logical equivalence 
is lost by skolemization, we have to adapt the notion of soundness of the cal- 
culus accordingly. By capturing the effect of skolemization in the addition of 
skolemization axioms, we can keep logical equivalence for most of the proof. 
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Finally, it is interesting to study calculi with built-in theories in order to im- 
prove the efficiency of automated theorem provers. It is generally recognized that 
automated theorem provers have problems proving theorems in theories with per- 
mutative axioms like associativity, commutativity, distributivity and the inverse 
law which are common in algebra, and there have been various approaches to 
the integration of these axioms into provers [17,11,16,5,2,8,13,14,15]. A similar 
argument holds for the use of equivalences on the level of logical formulas. State- 
of-the-art resolution theorem provers such as SPASS do a clause normal form 
transformation once at the beginning, which destroys in particular the equiva- 
lences. With some effort it is possible to reconstruct the equivalences [12] and 
take advantage of them, but to us it seems more fruitful to work towards using 
them directly. There has been some work on nonclausal resolution by Bachmair 
and Ganzinger [1], but this does not cover formulas with quantifiers. 

2 Preliminaries 

We consider first-order logic without equality with respect to fixed sets V of pred- 
icate symbols and T of function symbols. We assume that T contains countably 
many function symbols of each arity, in order to provide sufficiently many fresh 
function symbols for skolemization. An atom is a formula p(ti,...,t„) where 
p &V and ti, . . . ,t„ are terms. Propositions are built from atoms, T (truth), _L 
(falsity), by the junctors A, V , (implication), o (equivalence), and the 

quantifiers V and 3. We use the double arrow for rewriting: for a rule or a 

single step and for the reflexive-transitive closure of rewrites to normal 

form, i.e. s t ii s ^ t and t is irreducible. We write Pj^r for the subproposition 
or subterm of a proposition P at the position tt, and P[<5],r or P[t] 7 r for the 
proposition P where we have replaced the subformula or subterm at position tt 
by Q or t, respectively. 

A literal is either an atom or the negation of an atom, and a clause is a 
disjunction of literals. We will use constrained clauses of the form C [C] where 
C is a clause and C is a constraint. A syntactic equality constraint s « t is 
satisfied for those ground substitutions a that unify s and t, i.e. where sa = ta. 
Analogously, for a fixed given ordering on ground terms, a satisfies an ordering 
constraint s t if sct ta. We will use constraints that are conjunctions of these 
atomic constraints. The meaning of a constrained clause is the set of ground 
instances obtained by substitutions that satisfy the constraint. 

A (finite) multiset M over a set S' is a function from S into the natural 
numbers such that M{x) > 0 only for finitely many x in S. For each x in S, 
M{x) denotes the number of occurrences of x in M. The multiset extension i^mui 
of a strict partial ordering is the strict partial ordering on multisets over S 
that is defined by M >mui N if and only \i M N and for all a; in S such 
that N{x) > M{x) there exists an y in S such that y >~ x and M{y) > N{y). 
We will use that the multiset extension of a total ordering is total, that the 
multiset extension preserves well-foundedness, and that the ordering on multisets 
is dominated by the ordering on their maximal elements. 
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We consider propositions to be modulo associativity and commutativity (AC) 
for V and A . In particular, clauses that differ only in the order of their literals 
are identical. We write {ti/xi, . . . ,tn/xn\ for the substitution that replaces Xi 
by ti for f G {1, . . . ,n}. 

We use the following rules for the transformation to clause normal form: 



-T ^ T 


(1) 


-T ^ T 


(2) 


T A T 


(3) 


T A P 


(4) 


T V P=A P 


(5) 


T V P=A T 


(6) 


P ^ Q ^ (P — >■ Q) A (<5 P) 


(7) 


P ^ Q ^ ~>P V Q 


(8) 


-n^P^P 


(9) 


^(P V Q) ^ -P A -Q 


(10) 


^(P A Q) -P V -Q 


(11) 


-■(VxP) ^ Bx-'P 


(12) 


~'{3x P) Va;-'P 


(13) 


P V (Qi A Q 2 ) ^ (P V Qi) A (P V Q 2 ) 


(14) 


VxP P{z/x} 


(15) 


3xP^ P{f{yi , . . .,y„)/x} 


(16) 



where in (15) 2 ; is a new variable, and in (16) / is a fresh function symbol and 
x,yi, . . . ,yn are the free variables of P. A clause normal form of a proposition P 
is obtained by exhaustively applying these rules, with the restriction that (1)- 
(13) must be applied before (15) and (16), in order to apply the quantifier 
rules only below positive contexts. The clause normal form transformation is 
nondeterministic by the choice of new variables and fresh function symbols. This 
is not a problem, as in any context where a clause normal form is needed any one 
will do, i.e. this is don’t-care-nondeterminism. Note that by this definition clauses 
are not sets, but are formed from the same symbols as logical propositions. In 
particular, the empty clause is T. Also, by equivalence modulo AC these clauses 
behave as multisets, where the same element may occur several times. 

If we consider free variables to be universally quantified then (1)-(15) are log- 
ical equivalences, while (16) is only an implication from right to left. It becomes 
an equivalence if we add the implication in the other direction: 

\/yi,...,y„.(3x.P) P{f{yi,...,yn)/x} (17) 

We call (17) the skolem axiom for the skolem function symbol f with respect 
to 3x.P. We call a set of skolem axioms S fresh with respect to a set of propo- 
sitions N if no skolem function symbol of S occurs in N, and there is only one 
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skolem axiom for every skolem function symbol in S'. A fresh set of skolem ax- 
ioms is always obtained when fresh function symbols are used for skolemization. 



Lemma 1 Let N he a set of propositions and S a set of skolem axioms that is 
fresh with respect to N, and let I he a model of N. Then there exists a model I' 
of N\JS. 

Proof: Sketch: define the interpretation of skolem functions in I' so that they 
provide witnesses for the true instances of their corresponding existential for- 
mula. Since S is fresh this is possible without changing the truth value of N . 

Lemma 1 isolates the argument that the clause normal form transformation 
preserves satisfiability. By adding skolem axioms to our theory at the beginning, 
we get logical equivalence for all later steps. This simplifies our arguments below. 

3 Rewriting on Propositions 

Let Rp be a set of rewrite rules on first-order propositions such that left-hand 
sides are atomic, let Rt be a set of rewrite rules on terms, and let R = RpU Rt- 
The right-hand side of a rule in R may contain only free variables that also 
occur in the left-hand side. We write for the logical meaning of Rp, which is 
the set of logical equivalences O r | / ^ r G i?}. The intended meaning of the 
rules in Rt is equality. However, equality is not directly available to us, since we 
use first-order logic without built-in equality. As an alternative, we may apply 
Leibniz’ equality to atomic propositions to obtain a set of equivalences that 
capture the logical meaning of Rt- That is, we let 

Tr, = {A[/]^ GG A[r],r \ I ^ r £ R, A an atom, and tt a position in A}. 

This is adequate, since any model that satisfies can be factored through the 
congruence induced by Rt to obtain a model of Rt with respect to first-order 
logic with equality, while preserving the truth value of propositions. Finally, we 
let Tr = Tflp U Trj . The theory Tr is compatible with the rewrite rules in R 
in the sense of Dowek, Hardin and Kirchner [6]. It is somewhat smaller than 
the one given there, as it includes equivalences only for single rewrite steps and 
relies on the properties of logical equivalence for reflexive-transitive closure and 
for closure under contexts of logical operators and substitutions. 

We assume that R is confluent and terminating modulo AC for A and V , 
and write R{P) for the normal form of a proposition P with respect to 

4 The Inference System ENAR 

An inference system is a set of inferences on constrained clauses. Each inference 
has a main premise C, zero or more side premises C\, . . . ,Cn, and a conclu- 
sion D, and is written 

C Cl ... Cr, 

D 
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The main premise and the side premises have different roles in the completeness 
proof and in the resulting notion of redundancy for inferences. 

The calculus of Extended Narrowing And Resolution (ENAR) consists of the 
following two rules operating on constrained clauses: 



Extended Resolution 



-■Ai V ... V -lAn V C [Cl] V . . . V V [C2] 

C W D [Cl A C2 A A\ ~ . . . ~ An ~ B\ « . . . « i?m] 



where Ai « . . . « A„ « i?i « . . . « Bm is an abbreviation for Ai « A 2 A 
. . . A Ai « An A Ai « Ri A . . . A Ai « B^- The main premise of Extended 
Resolution is -'Ai V ... V -'An V C [Cij. 



Extended Narrowing j—. 

cl(C/[r]^) [C A (C/|^ « 0] 

where Z ^ r is a rule in R and UItt is not a variable. 

Here cl(P) denotes one of the clauses in a clause normal form of P. 

This calculus is slightly different from the original one [6], as it doesn’t use 
equality modulo a congruence. On the other hand, we allow the use of rewrite 
rules on terms. This is somewhat closer to an implementation, as in general the 
solving of the constraint in the original calculus would also be done by some form 
of narrowing. Note that by the redundancy criteria which we introduce below 
and by the use of equality constraints we cover also refined variants of narrowing 
such as normalized and basic narrowing. To keep the exposition simple we do not 
cover the case of narrowing modulo associativity and commutativity (AC) . The 
extension of our calculus to AC can be done using well-known techniques [11]. 

5 The Ordering on Propositions 

We say that an ordering has the multiset property for V L L' for any 
literal L' in a clause C implies L >- C. We assume an ordering on propositions 
that is well-founded, total on ground clauses, that has the multiset property 
for V , that satisfies -'A >- A for any atom A, that is compatible with rewriting, 
i.e., C (;^), and A P implies A y B for every ground instance B of an 

atom in P. The latter implies compatibility with Extended Narrowing, i.e. C D 
for every ground inference with main premise C and conclusion D. Compatibility 
with Extended Resolution follows by the multiset property. Note that we have 
replaced compatibility with rewrite rules and with contexts by the somewhat 
weaker compatibility with rewriting, i.e. the application of rewrite rules under 
contexts. We have done this because it is difficult to obtain compatibility with 
contexts for quantifiers under negative contexts. 

The property that A is greater than any ground instance of some B with 
free variables is not satisfied by typical term orderings, in particular not by 
simplification orderings. The separation of propositions and terms can be used 
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to avoid this problem by giving predicate symbols precedence over terms. This 
technique is applicable in particular for hierarchical definitions of predicates by 
equivalences, e.g. in set theory. We present an example below. 



6 Constructing Herbrand Models That Satisfy the 
Equivalences 

We now define a function, called closure/j, that maps a Herbrand interpreta- 
tion Hi for ground atoms that are irreducible by i? to a Herbrand interpretation 
for all ground atoms. The mapping is defined in such a way that the interpreta- 
tion of irreducible atoms is not changed and Tji becomes true in closure 

Let Hi be a set of ground atoms irreducible by R. We construct a tree from 
each closed proposition, whose inner nodes are labeled by A , V and -■ and 
whose leaves are irreducible ground atoms. O and — >■ are always expanded using 
rules (7) and (8). 

1. The tree for a reducible proposition P (w.r.t. R) is the tree for the normal 
form of P with respect to R. 

2. The tree for an irreducible ground atom H is a leaf labeled A. 

3. The tree for an irreducible proposition P A Q is labeled A at the root and 
has as children the trees for P and Q. 

4. The tree for an irreducible proposition P V Q is labeled V at the root and 
has as children the trees for P and Q. 

5. The tree for an irreducible proposition -<P is labeled -■ at the root and has 
the tree for P as the only child. 

6. The tree for an irreducible proposition \/x.P is labeled A at the root and has 
as children all trees for P{t/x} where t is a ground term. 

7. The tree for an irreducible proposition 3x.P is labeled V at the root and has 
as children all trees for P{t/x} where t is a ground term. 

Lemma 2 All the branches of the tree are finite. 

Proof: The only possible source of nontermination is the interaction of rewriting 
and instantiation of quantifiers. One of the properties of is that a rewrite step 
followed by instantiation decreases all propositions in the ordering. Since is a 
well-founded ordering this implies termination. □ 

Now we label the nodes of the tree with truth values true or false from the bottom 
up. A leaf A is labeled true if A G Hi, and false otherwise. A node labeled with A 
is labeled true if all its children are labeled true, and false otherwise. A node 
labeled with V is labeled true if some of its children is labeled true, and false 
otherwise. A node labeled -■ is labeled true if its child is labeled false and vice- 
versa. Since all branches are finite all the nodes are labeled. We let A belong to 
closure/j(i7i) if the root of the tree for A is labeled true. 

Lemma 3 Let T he a tree for a closed proposition P. Then the root of T is 
labeled true if and only if P is true in closureR{Hi). 
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Proof: By structural induction over the proposition. □ 

Lemma 4 closureR{Hi) ^ Tr. 

Proof: Consider some equivalence A o P in Tr. The tree for A is identical to 
the tree for P, because both are the tree for the normal form of A with respect 
to R, which is unique by confluence and termination of R. Hence their truth 
value is equal and the equivalence holds. □ 



7 Refutational Completeness 



To be able to lift narrowing steps on terms to constrained narrowing we use the 
standard technique that considers only reduced ground instances on the ground 
level. This technique was originally used to show completeness of basic narrow- 
ing [9], and later for constrained or basic first-order calculi [4,10]. Formally, a 
substitution a is reduced if xa is irreducible with respect to Rt for all variables x. 
An instance is called reduced if it is obtained by a reduced substitution. We write 
gnd(Af) for the set of ground instances of clauses in N , and rgndR(iV) for the 
subset of reduced ground instances of clauses in N . Note that ground instances 
have to satisfy the constraint. We also consider reduced ground inferences. Since 
premises can always be made ground, the restriction to reduced instances of 
inferences restricts only the instantiation of newly introduced variables in con- 
clusions. We will show that ENAR is refutationally complete by showing that 
ENAR has the reduction property for counterexamples, using the approach of 
Bachmair and Ganzinger [3]. 

We will define mutually recursive functions I and P that map a set of ground 
clauses to a set of ground atoms. Here I stands for “interpretation” and P for 
“produced”. Let M be a set of ground clauses. P{M) defines the interpretation 
of ground atoms that are irreducible with respect to R, and I{M) extends P{M) 
to all ground atoms, using the function closurcR defined above. 

The definition is with respect to the well-ordering on ground clauses, con- 
sidering the ground clauses in M in turn. For some ground clause C we write 
(M-^) for the ground clauses in M that are smaller than C (less or 
equal to C). 



P{M) 



A{M, C) 



U 

ceM 

{ {A} if C is false in 

C = C'yAy...y A where A >- C , and 
A is irreducible by i?; or 
0 otherwise. 



I{M) = closureR(P(M)) 



Finally we define the interpretation of a set of (non-ground) clauses N by 



I{N) = /(rgndR(A^)). 
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We write Nq for rgnd^(iV)^'". Note that A{M, C) = A{M^^ , C) = A{Nc, C), 
since this is the part of M that is used recursively. Since we start the definition 
of I{N) with the set of reduced ground instances of N, the set Z\(rgnd^(A^), C) 
is the increment that takes us from P(rgnd^(A^) ) to P(rgndjj(fV)- ). 

We say that a ground clause C produces Aii A & Z\(rgnd^(fV), C). We say 
that a ground clause C is a counterexample for I{N) if it is in rgnd^(A^) and 
false in I{N). Let C be the least counterexample for I{N) and let I be a ground 
inference with main premise C and conclusion D such that C >- D. We say 
that I reduces the counterexample C (with respect to I{N)) if I{N) ^ -■£). 
An inference system Calc has the reduction property for counterexamples (with 
respect to I) if there is a reduced ground instance of an inference in Calc that 
reduces C with respect to I{N) for any set N of ground clauses such that I{N) 
has the least counterexample C yf _L. 

Lemma 5 ENAR has the reduction property for counterexamples. 

Proof: Let be a set of clauses, let C be the least counterexample in I{N) and 
suppose C yf _L. Then C is the reduced ground instance of a clause C in N and 
has the form C \/ L \/ ... y L where L>- C for some literal L. 

(1) Suppose L is reducible by R. We have L = A or L = -lA and A is 
reducible by some rule B ^ P in R, where A = Ba and P = Pa. We consider 
the single i?-step A ^ P applied to C[A]^, resulting in the formula C[P]t^ 
that is false in I{N), since the equivalence A P holds in I{N). Any clause 
normal form of implies C[P].n., as skolemization is an implication in the 

reverse direction. Since C[P]^ is false in I{N), the clause normal form is also 
false in I{N), and there is a ground instance of a clause in the CNF that is false 
in I{N). As C is a reduced ground instance of some clause C in N, the position tt 
can not be a variable position of C. Hence there exists an Extended Narrowing 
inference 



C[A]^ [C] 

P* [C A i « P] 

such that Pi A ... A P„ is a clause normal form of C[P] and P^ [C A A « P] has 
a ground instance P that is false in I{N) for some i G {1,. . . ,n}. Since newly 
introduced variables are unconstrained we may even choose P to be a reduced 
ground instance of Di. Hence the inference above reduces the counterexample C. 

(2) Otherwise L is irreducible by R. 

(2.1) Suppose L is positive, i.e. P = A for some ground atom A. Since C 
is false in I {N), A is false in I{N) and in P{N). All other literals are smaller 
than A, hence their truth value depends only on the truth values of irreducible 
atoms smaller than A, which are the same in I{N) and in I{Nc). Therefore C 
is false in I{Nc). Hence C produces A and is true in /iv, a contradiction. 

(2.2) Otherwise L is negative, that is, L = -lA for some atom A. Since L 
is false, A is true in I{N), and by irreducibility A must be in P{N). This A is 
produced by some reduced ground instance P of a clause P in with C > D. 
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Then D = D' \/ A \/ ... \/ A, A >- D' , and D' is false in I{Nd). No clause 
greater than or equal to D can make a literal in D' become true, hence D' is 
false in I{N). We can resolve C and D\ 

C" V -lA V . . . V -lA D' V A V ... \/ A 
C' V D' 



This is a reduced ground instance of Extended Resolution that reduces C. □ 



Lemma 6 Let N he a set of clauses that is closed under ENAR. Then either 
± G N or I{N) \= rgnd^(Af). 

Proof: Suppose C ^ T is the least counterexample in N for I{N). Then by the 
reduction property there exists a reduced ground inference 

C Cl ... Cn 
D 

that is an instance of an inference in ENAR that reduces C . That implies that D is 
false m I {N), and since N is closed under ENAR the ground clause D is a reduced 
ground instance of N and hence a smaller counterexample, a contradiction to 
the minimality of C. □ 

A reduced ground instance Ca of a constrained clause C is called redundant 
in N (with respect to R) if there exist reduced ground instances CiU\, . . . ,CkCfk 
of clauses Ci,. . . ,Ck in N such that Ca >- CiUi for i = l,...,k, and Tr U 
{Cicri,...,Cfccrfc} ^ Ca. 

A reduced ground instance 

Cia ... Cna Cl ... Cn 

— of an inference — 

Ca C 

where Cna is the main premise is called redundant in N (with respect to R) if 
either one of the premises Ci<t, . . . , Cna is redundant, or if there exist reduced 
ground instances Diai , . . . , Dkak of N such that Cna >- Diai for i = 1, . . . , A: and 
Tfl U {Dial, ... , Dkak} |= Ccr. A non- ground clause or inference is redundant if 
all its reduced ground instances are redundant. The following well-known lemma 
allows to delete clauses without losing redundancy. 

Lemma 7 Let N he a set of constrained clauses and M the set of redundant 
clauses in N. Lf a constrained clause C is redundant in N then it is redundant 
in N \ M . 

Proof: Consider the least reduced ground instance of C that violates this prop- 
erty and derive a contradiction. □ 

A set N of clauses is called saturated up to redundancy (with respect to ENAR) 
if all inferences in ENAR from premises in N are redundant. 
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Lemma 8 Let N he a set of clauses that is saturated up to redundancy with 
respect to ENAR. Then either ± G N or I{N) \= rgnd^(A^). 

Proof: Suppose N is saturated up to redundancy, _L ^ and I{N) ^ 
rgnd^(A^). Let C be the least counterexample for I{N) among the reduced 
ground instances of N. Since ENAR has the reduction property for counterex- 
amples, there exists a reduced ground instance of an inference in ENAR that 
reduces C to some clause D that is also false in I{N). Since N is saturated 
w.r.t. ENAR, this inference is redundant, hence D follows from Tr and reduced 
ground instances smaller than C. By minimality of C these are true in I{N), 
and D must be true in I{N) as well, a contradiction. □ 

A theorem proving derivation is a sequence of sets of clauses Nq \- N\ \- . . . 
such that for all steps Ni h i > 0, either (1) Ni+i = NiVJ {C} for some 

constrained clause C such that TrVJ NiVJ Si H {C'} where Si is a set of skolem 
axioms, or (2) = Ni\{C} for some constrained clause C which is redundant 

in Ni, and (J - Si is fresh with respect to U A^o- In case (1) we call the step a 
deduction step and in case (2) a deletion step. For such a derivation the set of 
persistent clauses Noo is defined as Noo = Ui>o ^j>i 

A simplification may be viewed as two derivation steps {C} U A^ h {C, D} U 
N h {D} U N. That is, a clause C G N may be simplified to a clause D if 
Tr U {C} U N \= D and C is redundant in {D} U N. 

For example, tautologies are always redundant, and reduction with i? is a 
simplification. A subsumed clause that has at least one literal more than the 
clause that subsumes it is also redundant by this definition. However, the case 
where a clause subsumes one of its instances is not covered, as the individual 
ground instances do not decrease with respect to )^. At the price of some technical 
complications it is possible to extend the definition to also cover that case. 

We will now show that certain properties are preserved when going from Nq 
to the limit or vice-versa. 

Lemma 9 Let N be a set of constrained clauses, let L he a model ofTn, and let 
Nq \- Ni \- . . . be a theorem proving derivation. Lf all reduced ground instances 
of Noo ore true in L then all reduced ground instances of IJ^ Ni are true in I. 

Proof: Consider some reduced ground instance C of some clause C in Uiv,. If 
C is in Noo then C is true by assumption. Otherwise C has been removed by 
some deletion step Ni h A^i+i, hence it is redundant in Ni and thus in \J^Ni. 
Let M be the set of all redundant clause in IJ^ Ni, then by the above argument 
(Ui ^i) Noo. Hence C is redundant in Noo by Lemma 7, and there exist 

reduced ground instances of Noo that together with Tr imply C and are true 
in /. We conclude that C is true in /. □ 

This implies in particular that all reduced ground instances of A^o are true in I. 
We say that a clause is unconstrained if its constraint is T. A set of clauses is 
unconstrained if all its clauses are unconstrained. 




A Model-Based Completeness Proof of Extended Narrowing and Resolution 205 



Lemma 10 Let N he a set of unconstrained clauses. Then rgnd^(A^) U ^ 
gnd(iV). 

Proof: Consider some ground instance C of a clause C in gnd(fV). Then C = 
U [T] and C = Ua. By normalizing a with respect to Rt we obtain the ground 
substitution r = {a;cr 1 |rj jx \ x ^ xa}. The instance C = Ut \s reduced and 
is also a ground instance of C that solves the trivial constraint T, hence it is 
in rgndR(A^). All the changes of atoms by the reduction from C to C are covered 
by equivalences in Tr^, hence C is a consequence of rgndR(./V) U Tr^. □ 

Corollary 11 Let N he a set of unconstrained clauses, and let I he a model of 
Tr. Lf L is a model of all reduced ground instances of N. Then I is a model of 
all ground instances of N. 

Lemma 12 Let Nq \- Ni \- . . . he a theorem proving derivation. Lf Tr U Nq is 
satisfiahle then Tr U Noo is satisfiahle. 

Proof: Suppose we are given a model /q of Tr U A^o- Let S' = IJi be the set of 
skolem axioms used in the derivation. Then by Lemma 1 there exists a model / 
of Tr U iVo U S. Furthermore, Tr U TVq U S is logically equivalent to Tr U U S 
for f > 0, hence L \= Ni for all z > 0. We conclude / ^ A^oo- ^ 

A theorem proving derivation is called fair (with respect to ENAR) if all infer- 
ences in ENAR from clauses in Nao are redundant in Ni for some z > 0. Since the 
conclusions of ground inferences are always smaller than the main premise, and 
since they imply themselves, an inference can be made redundant by a deduction 
step that adds its conclusion. Thus a fair derivation is obtained by considering 
inferences in a fair way, i.e. not delaying an inference ad infinitum, and adding 
their conclusion if they are not already known to be redundant by some suitable 
sufficient criterion. 

Lemma 13 Let Nq \- Ni \- . . . he a fair theorem proving derivation. Then N^o 
is saturated up to redundancy. 

Proof: By fairness we get that every inference from premises in N^^) is redundant 
in (J^ Ni, and by Lemma 7 in A^oo- 

Theorem 14 Let Nq \- Ni \- . . . he a fair theorem proving derivation such that 
Nq is unconstrained. Then Nq is inconsistent if and only if Noo contains the 
empty clause. 

Proof: Since the derivation is fair, Nao is saturated up to redundancy. Thus 
either T G A^oo or /(A^oo) \= Tr U rgndR(A^oo) by Lemma 8. Let S be the 
set of skolem axioms used in the derivation. If T G N^ then Tr U Nq U S 
is inconsistent, and in turn Tr U Nq is inconsistent by Lemma 1, since S is 
fresh. Otherwise I(A^oo) |= rgndR(A^oo), 7(A^oo) \= rgndR(A^o) by Lemma 9, and 
I{Noo) 1= gnd(A^o)by Lemma 10, and Tr U Nq is consistent. □ 
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That is, ENAR is refutationally complete with respect to Tr. 

Looking back at the proof, in particular Lemma 5, we see that we have 
indeed proved refutational completeness of a more restricted inference system 
that constrains inferences to maximal atoms: 



Extended Resolution 



^Ai V ... V V C [Cl] V . . . V V iD [C2] 

C \/ D [Cl A C2 A Ai ~ . . . ~ Em A Ai C f\ Ai D] 



Extended Narrowing 
where I 



A \/ C[C] 



cl{A[r]^ W C)[C A A[^^l A A-^C] 
r is a rule in R and A]-^ is not a variable. 



The ordering constraints are interpreted by the given well-ordering To make 
this useful in practice it is of course necessary to provide a constraint solver. 
Note however, that it is sound to discard the ordering constraints whenever they 
are too hard to solve. 



8 Example 

We consider an example from set theory given by Plaisted and Zhu [12]. Suppose 
we have the rewrite rules 



x^y^xQy A y Q x (18) 

X Cy ^ \fz{z G X ^ z G y) (19) 

xGyC\z^xGy A x G z (20) 

that describe a fragment of set theory. 

We define the ordering on formulas as follows. We start by an ordering on 
atoms by letting (si « ti) >- (s2 C ^2) (S3 G ts) for all terms si, S2, S3, ti,t 2 , H- 

On terms we assume some simplification ordering that is total on ground terms, 
for example a lexicographic path ordering. We extend it lexicographically to 
atoms with the same predicate symbol. Literals are ordered first with respect to 
their atom and then to their polarity. That \s, A >- B implies [~^]A >- [-^B and 
-•A >- A for any atoms A and B. This is extended to clauses by the multiset 
extension of the literal ordering and to propositions in clause normal form by 
the multiset extension of the clause ordering. 

This ordering is total on ground clauses, since the term ordering is total, and 
it is extended to atoms, literals and clauses so that this property is preserved. 
By the same argument it is well-founded. 

We have to show that the ordering is compatible with the rewrite relation 
followed by instantiation. We first consider the effect of applying a rewrite rule 
to a single literal in a clause, and compare the instantiated clause normal forms. 
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There are six cases, and we easily see that in each case the ordering holds: 

x~y\/C^{xCy\/C) A (yCx\/C) 

-'X K, y y C > -IX Cj/V-'yCxVC' 
xCyVC)^ -it GxVtGyVC for any ground term t 
~^x Cy V C> {f{x,y) G a; V C) A {~^f{x,y) € y V C) 

X G yC\ z y C >~ {x Gy y C) A {x G z y C) 

-IX G yC\ z y C >- -IX G y y -ix G z y c 

The first four are covered by the precedence of the predicate symbols, and the 
last two by the subterm property of the term ordering. This ordering extends 
to a context containing additional clauses. An atom in a proposition may lead 
to several occurrences of the atom in the clause normal form. Rewriting such an 
atom thus leads to the replacement of several atoms. We may obtain its effect on 
the clause normal form by chaining together several of the simple replacements. 
Then by transitivity the ordering is compatible with any rewrite step. Since our 
ordering includes the transformation to clause normal form, compatibility also 
holds for Extended Narrowing inferences. 

Now suppose that in this theory we want to prove idempotency of intersec- 
tion, i.e., ^x.xC\x « X. To simplify the presentation, we prove only the direction 
\/x.x n x C X. We negate and skolemize and obtain Nq = {-la fl a C a}. To illus- 
trate the model construction we also give the candidate models corresponding 
to the clause sets in the derivation. Since the only clause in Aq has no posi- 
tive literal, we get Pno = 0- However, I{N) is not empty, as it contains atoms 
such as a C a. We check that Co = ~^{a fl a C a) is false in I{N) by rewriting 
Aq = a n a C a: 

a n a C o (Vcc.a; Gafla— >-a:Ga) 

'ix.{x GaAxGa)^xGa 

We easily see that the normal form is a tautology, hence it is true in particular 
in I{Nq). Then by definition Aq is true and Co is false in I{Nq). Thus Cq is the 
least counterexample, as it is the only ground instance of a clause in Nq. This 
counterexample can be reduced by Extended Narrowing: By rewriting Co we get 

-i{\/x.x GaDa^xGa) 
which we have to transform to clause normal form: 

-i{'ix.x GaHa^xGa)^ 3x.{x G aC\a A -i{x G a) 

=^bGalla A -lb Ga 

where b is the skolem constant introduced for x. For each clause in the CNF we 
have an Extended Narrowing inference that has it as its conclusion: 



and 



-lO n a C a 
b G aC\a 



( 21 ) 
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-■a n a C a 
-<b € a 



(22) 



We pick the first inference as it is false in /(TVq), and let Ni = NoU{b £ ofla}. We 
notice that Ci = 6 G a fl o is not productive, since it is reducible to b £ a A b £ a, 
hence we get another Extended Narrowing inference 

b £ an a 
b £ a 



which originates from both clauses of the reduct. We let N 2 = NiU{b £ a}. Now 
b £ a is productive in I{N 2 ), i.e., = {b £ a}, and the least counterexample 

for I{Ni) is once again Cg. To reduce it we now need the second inference 
above (22), and let N 3 = N 2 U {-•b G a}. The new clause becomes the least 
counterexample. It is irreducible and can be resolved with b £ a: 

b £ a -'b £ a 

I 

In this example we have used the model construction as our guide, choosing 
always the inference that reduces the least counterexample. We have done this 
to illustrate the model construction. In practice, however, it is usually not feasible 
to construct the model explicitly. 



9 Conclusions and Further Work 

We have proven the refutational completeness for Extended Narrowing and Res- 
olution with ordering restrictions, under the proviso that a suitable ordering 
for the given theory exists. We have avoided the problem that skolemization 
does not preserve logical equivalence by adding axioms implying that equiva- 
lence. It remains to investigate how useful ENAR is in practice, with or without 
the ordering restrictions. To this end we are currently working on a prototype 
implementation of ENAR in ELAN^. 

This work may be viewed as a first step towards calculi that integrate clause 
normal form computation with inferences. In particular for logical equivalences it 
seems preferable to use them for rewriting, instead of destroying their structure 
by an initial transformation to clause normal form. ENAR is limited by the fact 
that the rewrite system is fixed. It will be interesting to see whether it is possible 
to find good inference systems where equivalences are added dynamically, to 
perform a kind of paramodulation on the level of propositions. 

In the case where the rewrite system is positive in the sense of Dowek and 
Werner [7] the closure operation in our model construction resembles their con- 
struction of a premodel as a fixpoint of a functional derived from the rewrite 
rules. It will be interesting to further investigate this connection for the other 
cases in order to better understand which congruences lead to sequent calculi 
with the cut elimination property. 



^ http : //www. loria.fr/equipes/protheo/SOFTWARES/ELAN/ 
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Abstract. The two- variable- fragment CL of first order logic is the set 
of formulas that do not contain function symbols, that possibly contain 
equality, and that contain at most two variables. This paper shows how 
resolution theorem-proving techniques can be used to provide an algo- 
rithm for deciding whether any given formula in is satisfiable. Pre- 
vious resolution-based techniques could deal only with the equality-free 
subset C? of the two- variable fragment. 



1 Introduction 

The two-variable-fragment is the set of formulas that do not contain func- 
tion symbols, that possibly contain equality («), and that use only two variables. 
The two-variable fragment without equality C? is the subset of CfL not involving 
the predicate «. For example, the formula 'ix3y[r{x,y) A \/x{r{y,x) — >■ a; « j/)], 
stating that every element is r-related to some element whose only r-successor 
is itself, is in (but not in C^). Note in particular the ‘re-use’ of the vari- 
able X by nested quantifiers in this example. In the same way, it is possible 
to translate modal formulas into (without equality) by reusing variables. 
For example, the modal formula DODa can be translated into Vj/(r(x,y) — >■ 
3x{r{y,x) /\^y{r{x,y) — >■ a{y) ))). No equality is needed for translating modal 
formulas. 

Both two- variable fragments are known to be decidable. That is: an algorithm 
exists which, given any formula (f G will determine whether <f> is satisfiable. 
In [GKV97], decidability of C% is proven by analyzing the structure of possible 
models, and showing that if a formula </> has any models at all, then it has 
a model of size 0(2l‘^l). This gives in principle a decision procedure in non- 
deterministic exponential time. However this procedure is probably inefficient in 
practice. This is caused by inherent problems of backtracking. Any backtracking 
procedure will be spending time retrying details, that are irrelevant for the truth 
of the formula. Intelligent backtracking can decrease this problem, but cannot 
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completely remove it. Moreover, a backtracking procedure cannot be prevented 
from redoing the same work in different branches. Improved implementations 
might decrease this problem, but it cannot be removed completely. 

Opposed to this, a resolution procedure works bottom up, starting with the 
formula in which one is interested. This means that every clause that is derived is 
related to the original formula, and hence more likely to be relevant. Additionally, 
a clause can be seen as a lemma, which can be used many times, and which can be 
seen as representing the set of its instances. Because of this, the risk of repeated 
work is decreased. 

Another advantage, of a more practical nature, is that resolution decision pro- 
cedures are close to the standard methods of full first-oder automated theorem- 
proving, so that existing implementations and optimizations can be used. Indeed 
only a small modification in a standard prover is needed in order to obtain a 
resolution decision procedure. 

To date, resolution-based decision procedures have been developed for var- 
ious fragments of first-order logic, including the guarded fragment with equal- 
ity, the Godel class, and the two- variable fragment without equality (See 
[dN95],[dN00a],[GdN99]). 

A resolution theorem prover (for unrestricted first order logic) works as 
follows: The first order formula is translated into a set of clauses through a 
clausal normal form-transformation. After that, the resolution prover derives 
new clauses, using derivation rules. Examples of derivation rules are resolution, 
factoring and paramodulation. The prover can also delete redundant clauses, us- 
ing certain deletion rules. Gommon deletion rules are subsumption, demodulation 
and tautology elimination. This process terminates when either the empty clause 
is derived, or a stable set of clauses is reached. This is a set for which every clause 
that can be derived, can be immediately deleted by one of the deletion rules. For 
full first-order logic, there are formulas for which the process will not terminate. 

Resolution decision procedures are obtained by first identifying an appropri- 
ate clause fragment. Then it is shown that certain restrictions of resolution are 
complete for the given clause fragment, and that all newly derived clauses are 
within the clause fragment. After that it is shown that the first order formulas 
of the fragment under consideration can be translated into the given clause frag- 
ment. Finally it is shown that there exists only a finite number of non-redundant 
clauses in the given fragment. 

The strategy that we give in this paper is different from usual decision pro- 
cedures in the fact that it consists of two stages. First, the clauses are partially 
saturated under a restricted form of resolution. After that, it is shown that 
clauses containing equality can be replaced by clauses without equality, with- 
out affecting satisfiability. The result is a clause set that corresponds to the 
two-variable fragment without equality. 

At this point, one could continue the work by using any decision procedure 
for we prefer to stay within the resolution framework, we present a novel 
resolution decision procedure based on a liftahle order. 
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The plan of the paper is as follows. Section 2 motivates the search for an ef- 
ficient decision procedure for L%. Section 3 shows how equality can be removed 
from a formula (p of without affecting its satisfiability. Section 4 then presents 
the new resolution-based algorithm for determining satisfiability of formulas of 
. We assume familiarity with the standard terminology and the basic tech- 
niques of resolution theorem-proving. The reader is referred to [FLTZ93] Ch. 2 
for the relevant definitions. 



2 Motivation 

A logic is said to have the finite-model property if any satisfiable formula in that 
logic is satisfiable in a finite structure. It is easy to see that any fragment of first- 
order logic having the finite model property is decidable; and indeed, most of 
the known decidable fragments of first-order logic have the finite model property. 
(For a comprehensive survey, see Borger, Gradel and Gurevitch [BGG97].) One 
such fragment of particular interest here is the so-called Godel class: the set 
of first-order formulas without equality which, when put in prenex form, have 
quantifier prefixes matching the pattern 3*W3*. Godel [G33] showed that the 
Godel class has the finite model property, and is thus decidable. In the same 
paper, Godel claimed that allowing « in formulas of the Godel class would 
not affect the finite model property, a claim which was later shown to be false 
by Goldfarb [G84]. Between these two discoveries, Scott [S62] showed that any 
formula of the two-variable fragment can be transformed into a formula in the 
Godel fragment which is equisatisfiable. Relying on Godel’s incorrect claim, Scott 
concluded decidability for C%. Of course, what Scott actually showed was the 
decidability for £2 only. That the full two-variable fragment does indeed have 
the finite model property was eventually established by Mortimer [M75]. 

Most proofs of the finite model property actually yield a bound on the size 
of a smallest model of a satisfiable formula f in terms of the size of (number of 
symbols in) <f>, and the result for the two- variable fragment is a case in point. A 
satisfiable formula (p G CPf of size n has a model with at most 2°" elements, for 
some constant c (see Bgg97, pp. 377-381). Therefore, we can determine the sat- 
isfiability of (j) by enumerating all such models, a process which can evidently be 
completed in nondeterministic exponential time. Moreover, it can be shown us- 
ing standard techniques that satisfiability in C% is in fact NEXPTIME-complete 
(as, indeed, is satisfiability in Hence, the complexity of model enumeration 
agrees with the known worst-case complexity of determining satisfiability in . 

Nevertheless, enumeration of models up to a certain size is in practice an 
inefficient method for determining satisfiability in ^ G especially if no model 
exists. A much more promising method is to adapt a resolution-based theorem 
prover so that termination on formulas of £% is guaranteed. Such an approach 
has already been employed for other decidable fragments. In particular, the 
Godel class can be decided using ordered resolution] moreover, as was pointed 
out in [FLTZ93], by applying Scott’s reduction from to the Godel class, the 
same technique can be used to decide as well. Unfortunately however, this 
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method does not apply to the whole fragment since adding equality to the 
Godel class leads to undecidability. The main contribution of the present paper 
is to show that, with the aid of some technical manoeuvres, this approach can 
nevertheless be extended to the whole fragment . To the best of the authors’ 
knowledge, this is the first really practical decision procedure that has been 
proposed for the full two-variable fragment. 

The fragment is of particular interest when dealing with natural language 
input, because many simple natural language sentences translate into L%. To give 
a somewhat fanciful example, the sentence 

Every meta-barber shaves every man who shaves no man who shaves himself 

translates to the two-variable formula 

Vx (meta-barber (x) — >■ 

Vj/((man(y) A Vx((man(x) Ashave(x,x)) — >■ -'shave(7/, x))) — >■ shave(x, y))). 

Although by no means all of English translates to the two-variable fragment 
(just think, for example, of ditransitive verbs), a useful subset of English can 
nevertheless be treated in this way. In particular, Pratt-Hartmann [PHOO] gives 
a naturally circumscribed fragment of English which is shown to have exactly 
the expressive power of L%. Certainly, the two- variable fragment is more useful 
when dealing with natural language than other well-known decidable fragments, 
such as the guarded fragment [AvBN98] or any of the quantifier prefix fragments, 
whose formulas do not fit translations of natural language constructs easily. 

3 Making Equality Disappear 

In this section, we give give a method for removing equality from a formula in 
C%, based on resolution. Let (j) to be some formula of C%. We can assume that (f) 
contains only unary and binary predicates, since predicates of higher arity — as 
long as they feature only two variables — can be removed by a transformation (see 
[GKV97] for details). Throughout this section, if tp{x) is an £|.-formula whose 
only free variable is x, we use the abbreviation 3!x for fh® /l~-formula 

3x['0(x) A Vy( tp{y) -)> x « y)], 

asserting that ip is satisfied by exactly one object. 

Occurrences of the «-symbol fall into two groups. Negative occurrences can 
be ’simulated’ without recourse to equality. Positive occcurrences can be re- 
stricted to those belonging to a 3! quantifier. This is done in the key step of our 
procedure, which is described in Lemma 4. The remaining occurrences of « can 
axiomatized within This enables us to remove all occurrences of « . 

Definition 1. An atom is defined as usual. A literal is an atom, or its negation. 
A formula a is in conjunctive normal form if it has form ci A • • • A c„, where 
each Ci, (1 < f < n) is a disjunction Ci = fii^i V • • • V of literals. 
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A formula in CNF is not the same as a set of clauses, because the clauses can 
be universally quantified. We begin by removing individual constants from our 
formula (j). 

Lemma 1. Let (p € C%, and let the sets of individual eonstants, unary predieates 
and binary predicates occurring in <p he D, P and R, respectively. Let </>' he the 
result of replacing any atoms in (p involving individual constants with formulas 
according to the following table: 



Atom 


Replacement formula 


p{d) 
r{d, x) 
r{d,y) 
r{x, d) 
r{y,d) 
r{d, d') 


3x{p{x) Apd{x)) 

3y{r{y,x) Apd{y)) 

3x{r{x,y) Apd{x)) 

3y{r{x,y) Apd{y)) 

3x{r{y,x) Apd{x)) 

3x3y{r{x, y) A pd{x) A pd' (y)) 



where pGP,rGR, dGD, and where the unary predicates Pd{x) (for d € D) 
are all new. Then p is equisatisfiable with 

po ■■= P' A Pdi.x). 

In fact, individual constants will be reintroduced later; however removing them 
at this point makes the key step described in Lemma 4 much easier to follow. 
Next, we convert to Scott normal form. The following result is standard (see, 
e.g. [BGG97] lemma 8.1.2). 

Lemma 2. Let p he a formula in C%. There is an equisatisfiable formula pi 
with form pi = /3i A • • • A /3„, where each Pi has one of the following three forms: 

3x at, \/x3y Oj, or\/xVy ai. 

Each Ui is a formula in conjunctive normal form. We call the types of the Pi 
from left to right Type 1, Type 2, and Type 3. Lf Pi is a Type 1 formula, then ai 
contains at most the variable x. If Pi is a Type 2 or a Type 3 formula, then ai 
contains at most the variables x and y. There are no constants in the ai. 

The transformation in question introduces some new predicate letters, but 
no new individual constants or function symbols. Next, we move all occurrences 
of « out of the ai into the quantifiers. 

Lemma 3. Let pi be as defined in Lemma 2. Formula pi can be transformed 
into a formula <('2 = /3i A • • • A /3„, where each Pi has one of the following forms: 

3x ai, 

Vx3y {x^y /\ a*), 
or 

'ix'iy (a; « y V «i). 

Now each of the ai is a formula without equality, in conjunctive normal form. 
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If (f )2 contains more than one Type 3 formula, then they can be merged. In 
the sequel we will assume that this is done, and that there is only one Type 3 
formula in 02 - 

We come to the key idea of the present transformation — the elimination of 
the positive occurrence of « in a disjunction \/x\/y{a{x, y) V P{x) V 7 (j/) V x fv y) 
of (f> 2 - Our solution is to saturate (f )2 partially under resolution, and than to 
eliminate the disjunctions that have a non-empty a{x, y). When this is done, we 
have only disjunctions of the form \/x\/y{x fv yV j3{x) V 7 (j/))- These can be read 
as: If /3(x) does not hold everywhere, and 7 ( 2 /) does not hold everywhere, then 
there is one point c, which is the only point on which /3(c) does not hold, and also 
the only point on which 7 (c) does not hold. This provides the transformation of 
« into 3!. 

Definition 2. We need the following, restricted version of the resolution rule. 
It is restricted because we allow resolution only between two-variable literals. 

Let j3{x,y) V r\ and ->(3{x,y) V r 2 be disjunctions, occurring in an at of 
4 >2- Then r\ V r 2 is a resolvent. The r\ and V 2 are disjunctions of literals. We 
implicitly assume that riVr 2 is normalized after the resolution step. That means 
that multiple occurrences of the same literal are removed. 

We allow swapping of variables, but we do not allow proper instantiation. So 
j3{y, x) V r\ and -'(3{x, y) V T 2 can resolve into r\ [cc y] V r 2 . 

Inside a formula (p 2 i defined as in Lemma 3, we allow resolution as follows: 
Resolution is allowed between disjunctions inside the Type 3 formula. The results 
are added to the Type 3 formula. We also allow resolution between a disjunction 
inside a Type 2 formula and a disjunction inside a Type 3 formula. The result 
is added to the Type 2 disjunction that was used. 

It is easily checked that the resolution rules are sound. If (j/ is obtained from 
(/) by a resolution step, then (j)' -f-)- 4>. Termination follows from the fact that only 
finitely many normalized disjunctions exist over a given signature. 

Lemma 4. Let 4>2 be defined as in Lemma 3. Let ^3 be its closure under res- 
olution, as defined in Definition 2. Let 4>4 be obtained from 4>3 by removing all 
disjunctions containing a two-variable literal (other than from the Type 3 
formula. Then 4>3 has a model iff <j )4 has a model 

Proof. It is clear that if 4>3 has a model, then </>4 has a model, since resolution 
is a sound inference rule. 

For the other direction, let iB be a structure in which <(>4 is true. We will 
modify 05 into a new structure $*, in which </>3 holds. We use B for the domain 
of 03. The new structure 05* will have the same domain B. It will be obtained 
from 05 by changing the truth- values of the two- variable predicates. For the rest, 
05* will be identical to 03. 

We assume that both (fs and </>4 are decomposed as in Lemma 3. We write 
/3i A • • • A /3„ for the decomposition of (ps, and /3( A • • • A /3(, for the decomposition 
of 4 >4. We define Oj and a' accordingly. We use t for the index of both Type 3 
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subformulas. Obviously the /?■ can be arranged in such a way that (z yf t) 

(A = /3D- 

As said before, we intend to reinterprete the two-variable predicates in such 
a way that [3t becomes true. When doing so, we run the risk of making a (3i of 
Type 2 false. In order to avoid this we need the following: 

For each Type 2 subformula j3i of </>4, we assume a choice function fi of type 
B ^ B. It is defined as follows: Because (3i = Va;3j/(x ^ y f\ af) is true in 18, 
for each b\ £ B, there exists a &2 in B, s.t. Q5* |= o;i(&i,&2)- The choice 
function fi is defined by choosing one such 62 for each bi . 

Let 61 and 62 be two distinct elements of B. We define the pattern of 18 on 
{^1) ^2} the vector of truth values for the binary predicates involving both of 
bi and &2- So, it pi, . . . ,pr are all the binary predicates symbols of (j)^, then for 
each {bi,b2}, the pattern determines whether or not 18 |= Pj{x,y){bi,b2) and 
whether or not 18 \= Pj{y,x){bi,b2). It does not say anything about the unary 
predicates, nor about 18 \= pj{x,y){bi,bi) or 18 \= Pj{x,y){b2,b2)- 

The intuition of the construction is as follows: If ^3 is not true in 18, this is 
caused by the fact that there are (61,62) G B, for which 18 ^ at(6i,62). This 
must be caused by the fact there is a disjunction '){x,y) V 6(a;) V rj{y) G at, for 
which 18 ^ 6(a;)(6i), 18 ^ ’q{y){b2), and 18 ^ a{x,y). 

We will change the pattern on {61,62} in such a way that a{x,y) will become 
true. 

Before we can proceed, we need to define the subformulas that are involved: 
Let Ai, . . . , Ai, . . . , Ap, p > 0 be the indices of the Type 2 subformulas of (j)^, for 
which f\^{bl) = 62, Similarly, let y,\, y,j, p,q, g > 0 be the indices of the 
Type 2 subformulas of (ps, for which ffij{b2) = b\. 

The at and the a\^ and a^^ are formulas in conjunctive normal form, i.e. 
conjunctions of disjuctions. We are going to select the disjunctions, whose truth 
depends on the binary predicates on {61, 62}. For each i, (1 < i < p), let a}, be 
obtained from a\^ by selecting those disjunctions of which the literals involving 
one variable are false in 18 on (61,62). Let a\, be obtained by removing the 
one- variable predicates from a\^. For j, (1 < j < q), we define aj^^ and 
analogeously. 

For at we need two copies, because of the two directions involved. Let a}^ be 
obtained from at be selecting those disjunctions of which the literals involving 
one variable are false in 18 on (61,62). Let a^^ be obtained by deleting the 
one- variable predicates from . Similarly, let a\^ be obtained by selecting the 

disjunctions, of which the one- variable predicates are false on (62,61). Then aj^ 
is obtained by deleting the one-variable predicates from . 

It is clear that in order to obtain 18*, it is sufficient to replace in 18 the 
patterns on each {61, 62}, 61 yf 62 by a pattern making the a^^, af^ 

true on (61, 62). The patterns can be replaced in 18 in parallel. 

It remains to show that the patterns exist. This is guaranteed by the fact 
that at and the ai of Type 2 are sufficiently closed under resolution. Since we 
are considering a fixed (61,62), we are dealing with a propositional problem. 
We apply Lemma 11, using r = af^ U using ci A • • • A Cm = aPj^.. 
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It is easily checked that all necessary resolvents are allowed by Definition 2. 
The aj^. are consistent, because they are true in iB on (61,62). It is also 
clear U cannot contain the empty clause, since it would originate from 
a disjunction in at for which the one- variable literals are false in *8 on (61,62). 
But this disjunction would be in a[ as well, as it does not contain a two- variable 
literal. This contradicts the assumption that (f>4 is true in iB. 

Thus, Lemma 4 tells us that, if we saturate ^3 under resolution, then we 
can delete all the disjunctions a{x, y) V (3{x) V 7(2/) V a; « y, for which a{x, y) is 
non-empty. Although exhaustive application of resolution is computationally ex- 
pensive, in the context of determining satisfiability in the two-variable fragment, 
the transformation step from to ^4 in fact comes for free. This is because ex- 
isting resolution-based approaches to determining satisfiability in begin with 
conversion to Scott normal form, followed by clausification and exhaustive ap- 
plication of ordered resolution anyway. All the present procedure requires is that 
we perform a resolution version of resolution first, and then pause to delete the 
inseparable clauses, before resuming (pretty well) what we would have done even 
if no equality were present. It hardly needs mentioning that the elimination of a 
whole set of clauses requires no computation whatever: in particular, the model- 
theoretic manipulations used to prove Lemma 4 form no part of the decision 
procedure for L%. 

The negative occurrences can be easily deleted by introducing a new, non- 
reflexive predicate neq. We add a formula Va;(-'neq(a;, x)), and replace each neg- 
ative equality x 9^ y by neq(x, y). We will do this later, we now first concentrate 
on the positive equalities in the Type 3 formulas. 

The remaining steps in our equality-deletion procedure are all straightfor- 
ward. There is (at most) one positive occurrence of « left. In occurs in the 
Type 3 subformula at and the other literals in at are unary. The following logi- 
cal equivalence is simple to verify: 

Lemma 5. Let y(x) he a formula not involving the variable y and let 6{y) a 
formula not involving the variable x. Then the formulas 

VxVy(7(x) V S{y) V x « y) 

and 

Vx 7(x) V Vx 5{x) V (3!x -i 7(x) A Vx(7(x) o 6(x))) 
are logically equivalent. 

Using this, we can use the splitting rule to decompose the disjunctions of 
the Type 3 formula. The result is a formula, in all positive occurrences of « 
belong to a 3! quantifier. These can be eliminated by introducing new individual 
constants. 

Lemma 6. Let y be a conjunction of constant-free -formulas in prenex form 
with quantifier prefixes Vx and Vx3y, and for each i (1 < i < m) let Q he a 
quantifier- and constant-free formula of not involving the variable y. Define 
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Or :=Ci(ei) A /\pyx{Ci{x) -)> (p(x) O p(e*)))A 
Ag'^xVy(0(x) (g(x,y) O q(er,y)))A 

Ag^xVy(0(x) (q(y,x) O q(y,ei))) 

where e\, . . .Cm are (new) individual constants, p ranges over all unary predicates 
mentioned in p or any of the (i, and q ranges over all binary predicates mentioned 
in p or any of the Q . Then the formulas 

V' ■■= V Ai<r<m^'-xCi{x) 

and 

:=??A Ai<*<™ Or 



are equisatisfiable. 



Proof. If 95 \= 'if, then it is easy to expand 03 to a structure 05 ' such that 
\= if ' ■ Conversely, suppose that 05 ' \= if ' , where the domain of 03 ' is B. Let 
bi,...bm be the denotations of the constants Ci, . . . in respectively. Now 
define a function f on B as follows: 




br if »' h A[b] 
b otherwise. 



And define the structure 03 with domain f{B) as follows: 



<8 hp[/(fe)]iff h#] 

^^q[f{b),f(b')]m $' h#A'] 



where b ranges over B, p over the unary predicates mentioned in if' and q over 
the binary predicates mentioned in if' . Since $' \= if', 03 is well-defined. It is 
then easy to see that 03 |= A- 



Theorem 1. Let <f be any formula of C%. Then the steps described in this sec- 
tion allow us to compute an equisatisfiable formula <f* in the class C? . Indeed, 
if (f* is satisfiable over some domain, then (f is satisfiable over a subset of that 
domain. Moreover, (f* can be written as a conjunction of prenex formulas with 
quantifier prefixes 3x, 'ix'iy and Vx3y. 

Thus, the satisfiability problem for C% has been reduced to that for . Moreover, 
as we have observed, no significant extra computational cost is incurred in this 
reduction. 

Finally, we note that theorem 1 allows us to infer that C% has the finite 
model property from the corresponding fact about the Godel fragment. This 
constitutes an alternative to the proofs cited in Section 2. 
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4 The 2- Variable Fragment without Equality 

In this section, we provide a new decision procedure for the two-variable frag- 
ment without equality. Practically, the method does not differ from the method 
given in [dNOOa], but the theoretical foundation is different. The method that 
we give here is based on a liftable order, i.e. an order that is preserved by sub- 
stitution. The advantage of liftable orders, is that they are better understood 
theoretically. It is the (far) hope of the authors that this will eventually leed 
to an understanding of what makes the resolution decision procedures work. At 
this moment, the termination/completeness proofs are a collection of tricks. One 
would hope for a real understanding of the relation between model based deci- 
sion procedures, and resolution based decision procedures. Decision procedures 
based on liftable orders appear to be a step in this direction. 

As said before, procedure makes use of indexed resolution [B71], also called 
lock resolution. It works as follows: one starts with some clause set upon which 
we want to apply resolution. First, integers are attached to the literals in the 
initial clause set. This can be done in any arbitrary way; and distinct occurrences 
of the same literal can be indexed with different integers. After that, the integers 
can be used by an order restriction for determining which literals can be resolved 
away. When a resolvent is formed, the literals in the resolvent simply inherit their 
indices from the parent clauses. In standard versions of lock-resolution, only the 
literals indexed by a maximal integer can be resolved away. Our procedure is 
slightly more general: we allow the selection function to look at both the index 
and the the literal itself. The key property possessed by this selection function 
is that is is obtained by lifting some order ^ on the ground indexed literals, as 
we will explain below. 

4.1 Indexed Resolution 



Definition 3. An indexed literal is a pair (L,a), where L is a literal, and a is 
an element of some index set. We use the index set {0, 1}. We write L:a instead 
of (L, a) . An indexed clause is a finite set of indexed literals. 

The effect of a substitution 0 on an indexed literal A: a is defined by {A-.a)0 = 
AO'.a. The effect of a substitution on a clause is defined memberwise. 

In the sequel we use the term clause to mean either a clause (in the usual 
sense of a finite set of literals) or an indexed clause. It will be clear from the 
context what type of clause we mean. When present, the indices play no role in 
the semantics of the clause; however, they do play a role in determining which 
literals are selected. Extending the notion of a selection function to indexed 
clauses in the obvious way, we define: 

Definition 4. A selection function E is a function mapping indexed clauses to 
indexed clauses for which always E(c) C c. For a clause c, we call the indexed 
literals in E{c) selected. 
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Resolution. Let ci = {Ai'.ai}\J R\ and C 2 = {-'A 2 : 02 } U i ?2 be clauses, s.t. Ai 
and A 2 are unifiable and selected. Let 0 be the most general unifier of Ai 
and A 2 . Then the clause RiO U i?20 is a resolvent of C\ and C 2 - 
Factoring. Let c = {Ai: oi, A 2 : 02 } U i? be a clause, s.t. Ai'.ai is selected, and 
^ 1,^2 have most general unifier 0. Then {A2'.a20} U R0 is a factor of c. 

In addition to resolution and factoring, our decision procedure for uses 
the following rules. 

Subsumption. If for clauses ci, C 2 there is a substitution 0, such that Ci0 C C 2 , 
then Cl subsumes C 2 . In that case C 2 can be deleted from the database. 
Splitting. The splitting rule can be applied when a clause c can be partitioned 
into two non-empty parts that do not have overlapping variables. If c can be 
partitioned as i?i V i? 2 , then the prover tries to refute R\ and R 2 indepen- 
dently. 

Observe that the subsumption rule has to take the indices into account. The 
clause {p{x,y):l} does not subsume the clause {p(x, 0):0}. 

Definition 5. Let -< be an order on ground indexed literals. Let c be a ground 
clause containing an indexed literal A\a. An indexed literal A. a is maximal in c, 
if there is no literal B:b G c for which Aa^B: b. 

We say that a selection function S is obtained by lifting ^ if it meets the con- 
dition that, for every clause c and indexed literal Aa € c, if c has an instance 
c0 in which {A:a)0 is maximal, then Aa & B{c). 

The significance of Definition 5 lies in the following completeness result, which 
has a simple proof. It can be obtained by modifying the completeness proof for 
lock resolution of [CL73], or the one in [FLTZ93]. 

Lemma 7. Let C be a set of initial clauses. Let E be a selection function ob- 
tained by lifting some order -< . Let C be a set of indexed clauses satisfying the 
following: 

— For every clause c = {Ai , . . . , Ap} € C, there exists an indexed clause d G C 
that subsumes some indexing {Apai , . . . , Ap-.Up} of c. 

— For every clause c that can be derived from clauses in C, either by resolution 
or by factoring, there is a clause d G C that subsumes c. 

Then, if C is unsatisfiable, C contains the empty clause. 

4.2 The S'2-Class 

The S'^-class characterizes the format of clauses obtained from clausifications of 
formulas in the Gddel class. The equality-free formulas, produced by the transfor- 
mation of Section 3, can be transformed directly into by Skolemization. For- 
mula in full can be transformed through the Godel class, or directly through 
a structural clause transformation, as is done in [FLTZ93]. 
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Definition 6. A clause c is in S'^ if it meets the following conditions: 

1. c contains at most 2 variables, and c contains no nested function symbols. 

2. If c contains ground literals, then it is a ground clause. 

3. Each functional term in c contains all variables occurring in c. 

4-. There is a literal in c that contains all variables of c. 

Note that conditions 2 and 4 can be ensured by application of the splitting rule. 

The results of section 3 thus suffice to reduce satisfiability in to satis- 
fiability in However, closer examination of the transformation allows us to 
be slightly more specific about the class of clauses we need to consider. Start- 
ing with an arbitrary /li-formula </>, theorem 1 guarantees the existence of an 
equisatisfiable formula (j)* which is a conjunction of £^-formulas in prenex form 
with quantifier prefixes Vx, 'ix'iy and Vx3?/. Skolemizing and clausifying (jf thus 
yields a collection of clauses in which all function symbols have arity 1. Further- 
more, given that the clauses in question are to be indexed in some way, there is 
nothing to stop us indexing two-variable clauses with index 1 and clauses with 
fewer than two variables with index 0. This leads us to the definition: 

Definition 7. The class is defined as the class but with additional 

conditions: 

1. All functions are 1-place. 

2. If c contains 2 variables, then the indexed literals with 2 variables are exactly 
the literals with index 1. 

Notice, incidentally, that Conditions 3 and 1 immediately imply that all two- 
variable clauses are function-free. 

At first sight, it might appear that we are making no use of the possibil- 
ity of giving different indices to distinct occurrences of the same literal. How- 
ever, resolving {p{x, y): 1, q{x, y): 1, q{x, x)\ 0} with {-ip(x, x): 1}, yields the clause 
{g(a;, x): 1, < 7 (a;, a;): 0}, in which different indices are used for distinct occurrences 
of the same literal. To obtain a decision procedure for it suffices to de- 

fine a selection function which is obtained by lifting, and which ensures that 
all derived clauses are within Correctness follows from the soundness of 

resolution and lemma 7; termination follows from the fact that for a given finite 
signature, there exist only finitely many non-equivalent clauses in 5'^“'"*. 



4.3 A Decision Procedure for the S'^+’^-Class 

We begin by establishing a suitable order on ground literals. 

Definition 8. Let the order <2 on ground indexed literals be defined as follows: 

— If literal A is strictly less deep than B, then A: a -<2 B: b. 

— If literal A and B have the same depth, and a < b, then A: a -<2 B: b. 



Next, we give the selection function E 2 used in the second phase of resolution. 
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Definition 9. Every indexed literal A. a in a clause c is selected, unless one of 
the following two conditions holds: 

— A has no functional terms, a = 0, and there is a literal B:b in c with b = 1. 

— A. a has no functional terms, and there are literals with functional terms in 
c. 



Lemma 8. E 2 is obtained by lifting <2 ■ 

Proof. Let c be an arbitrary clause in Let A: a be a literal in c that is not 

selected. We need to show that in every instance c0 of c, the literal A0: a is 
non-maximal. First observe that A cannot have any functional terms. 

If there is a literal B: b with functional terms in c, then this functional term 
contains all variables of A. So, for every substitution 0 it is the case that B0 is 
deeper than A0. As a consequence, A0:a is non-maximal in c0. 

If there is no literal B: b with functional terms, then a = 0 and there must be 
a B:b € c with & = 1. If c is a one- variable clause, then A0 and B0 always 
have the same depth, for every substitution 0. Because of this A0:O -<2 B0-. 1. 
Otherwise Condition 2 applies to c. Literal A contains 1 variable and literal 
B contains 2 variables. Write A[A]:0 and B\X,Y]:1. Let 6> be a substitu- 
tion. If Y0 is deeper than X0, then B\X,Y]0 is deeper than A[A]6* and 
A[X]0:O ^2 B[X,Y]0-.l. Otherwise A[A]0 and B[X,Y]0 have equal depth 
and also A[X]0:O -<2 B[X,Y]0:1. This completes the proof. 

Note how the indices play an essential role in the definition of S 2 '. no selection 
function obatined by lifting an ordering on unindexed literals could ensure that 
p{x, y) is always prefered over p{x, x), since the literals have a common instance. 
The next step is to show that resolution with E 2 never leads outside 

Lemma 9. Each literal selected by S 2 contains all variables of its clause, and 
contains a deepest occurrence of each variable in the clause. 

Proof. If the clause is ground, then the lemma is trivial. If the clause has one 
variable, then all literals have one variable, by condition 2. Moreover, if there 
exist any functional literals in the clause at all, any selected clause is functional, 
and so contains a deepest occurrence of its literal by condition 1 . If the clause is 
two- variable, it must be non- functional by conditions 3 and 1. Thus, any selected 
literal must have index 1, and hence contains both variables by condition 2. 



Lemma 10. The strategy keeps clauses inside . 

Proof. It is quite standard to show that every selection function that satisfies 
the conditions of Lemma 9 preserves Conditions 1-4 of Definition 6. See for 
example the remark on p. 115 of [FLTZ93]. Condition 1 is obviously preserved 
by resolution. The only difficulty lies in showing that condition 2 applies to the 
resolvent of any two clauses in Let c by such a resolvent, then, and assume 

that c is a two- variable clause. 
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Since all functions are unary, it is obvious that any two- variable literal in c must 
have come from a two-variable literal in one of the parent clauses, and hence 
must have index 1. 

Conversely, we must show that any 1-indexed literal in c is two- variable. Let the 
parents of c be ci and C2, let B\ € ci and B2 € C2 be the literals resolved upon, 
and let 6> be the substitution used in this resolution. Without loss of generality, 
any 1-indexed literal in c may be written AO:l, where A:1 G ci. By lemma 9, 
Vars(ci) = Vars(Bi) and Vars(c2) = Vars(i?2). Hence Vars(ci6>) = Vars(i?i0) 
and Vars(c20) = Vars(i?26>). But we also have Vars(i?i6>) = Vars(i?20), whence 
Vars(ci0) = Vars(c26>). Thus, Vars(c) C Vars(ci0) U Vars(c2©) = Vars(ci6>). 
Since Ci is in 5'^+*, condition 2 implies Vars(H) = Vars(ci), whence Vars(c) C 
Vars(H) as required. 

Gathering together the lemmas in this section, we have 

Theorem 2. The rules of resolution, factoring and subsumption give us a de- 
cision procedure for sets of clauses in the class 

This completes the description of the resolution procedure for 

We end the section with a technical lemma that was needed for the proof of 

Lemma 4. 

Lemma 11. Let c\, . . . ,Cm and r be sets of propositional clauses. Let r be closed 
under resolution. Furthermore assume that each possible resolvent between a 
clause of a Ck and a clause of r is in Cfe. Then, if ci A ■■■ A Cm is consistent 
and r does not contain the empty clause, then c\ A ■■■ A Cm A r is consistent. 

Proof. The result can be easily obtained from the completeness of semantic 
resolution. Semantic resolution is obtained by fixing an interpretation /, and by 
forbidding resolution steps between two clauses, that are both true in /. Semantic 
resolution is proven complete in [CL73]. 

Since c\ A ■■■ A Cm is consistent, there is an interpretation / that makes 
Cl A • • • A Cm true. If ci A ••• A Cm A r were inconsistent, then r would contain the 
empty clause. All clauses that are false in /, must be in r. Hence a resolution 
step involving a false clause is always allowed. 

5 Conclusions 

In this paper we have given a practical procedure for deciding satisfiablity in the 
two- variable fragment with equality. This procedure involves two new contribu- 
tions. The first is the use of resolution to transform formulas in the two- variable 
fragment with equality to the two- variable fragment without equality. The second 
is a new resolution-based procedure for deciding satisfiability in the two- variable 
fragment without equality based on a selection function obtained by lifting an 
order on ground indexed literals. 

At this moment, handling of constants is unsatisfactory. We hope to be able 
to adapt Definition 2 and Lemma 4 in such a way that they can handle constants. 
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Abstract. We present a calculus for first-order theorem proving in the 
presence of the axioms of totally ordered divisible abelian groups. The 
calculus extends previous superposition or chaining calculi for divisible 
torsion-free abelian groups and dense total orderings without endpoints. 
As its predecessors, it is refutationally complete and requires neither ex- 
plicit inferences with the theory axioms nor variable overlaps. It offers 
thus an efficient way of treating equalities and inequalities between ad- 
ditive terms over, e. g., the rational numbers within a first-order theorem 
prover. 



1 Introduction 

Most real life problems for an automated theorem prover contain both uninter- 
preted function and predicate symbols, that are specific for a particular domain, 
and standard algebraic structures, such as numbers or orderings. General the- 
orem proving techniques like resolution or superposition are notoriously bad at 
handling algebraical theories involving axioms like associativity, commutativity, 
or transitivity, since explicit inferences with these axioms lead to an explosion of 
the search space. To deal efficiently with such structures, it is therefore necessary 
that specialized techniques are built tightly into the prover. 

AC-superposition (Bachmair and Ganzinger [1], Wertz [12]) is a well-known 
example of such a technique. It incorporates associativity and commutativity into 
the standard superposition calculus using AG-unification and extended clauses. 
In this way, inferences with the theory axioms and certain inferences involving 
variables are rendered unnecessary. Still, reasoning with the associativity and 
commutativity axioms remains difficult for an automated theorem prover, even if 
explicit inferences with the AG axioms can be avoided. This is not only due to the 
NP-completeness of the AG-unifiability problem, but it stems also from the fact 
that AG-superposition requires an inference between literals ui + - ■ -+Uk ~ s and 
v\ + ■ ■ ■ + vi ~ t (via extended clauses) whenever some Ui is unifiable with some 
Vj. Gonsequently, a variable in a sum can be unified with any part of any other 
sum “ in this situation unification is completely unable to limit the search space. 

The inefficiency inherent in the theory of associativity and commutativity 
can be mitigated by integrating further axioms into the calculus. In abelian 
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groups (or even in cancellative abelian monoids) the ordering conditions of the 
inference rules can be refined in such a way that summands Ui and Vj have 
to be overlapped only if they are maximal with respect to some simplification 
ordering (Ganzinger and Waldmann [4,8], Marche [5], Stuber [7]). In this way, 
the number of variable overlaps can be greatly reduced; however, inferences with 
unshielded, i. e., potentially maximal, variables remain necessary. 

In non-trivial divisible torsion- free abelian groups (e. g., the rational numbers 
and rational vector spaces), the abelian group axioms are extended by the 
torsion-freeness axiom Vfc G Vx, y: kx ~ ky ^ x ~ y, the divisibility axiom 
Vfc G \/x 3y: ky « x, and the non-triviality axiom 3y: y ^ 0. In such 
structures every clause can be transformed into an equivalent clause without un- 
shielded variables. Integrating this variable elimination algorithm into cancella- 
tive superposition results in a calculus that requires neither extended clauses, nor 
variable overlaps, nor explicit inferences with the theory axioms. Furthermore, 
using full abstraction even AC unification can be avoided (Waldmann [10]). 

When we want to work with a transitive relation > in a theorem prover, 
we encounter a situation that is surprisingly similar to the one depicted above. 
Just as associativity and commutativity, the transitivity axiom is fairly prolific. 
It allows to derive a new clause whenever the left hand side of a literal r > s 
overlaps with the right hand side of another literal s' > t. As such an overlap is 
always possible if s or s' is a variable, unification is not an effective filter to control 
the generation of new clauses. The use of the chaining inference rule makes 
explicit inferences with the transitivity axiom superfluous (Slagle [6]). Since this 
inference rule can be equipped with the restriction that the overlapped term 
s must be maximal with respect to a simplification ordering )^, overlaps with 
shielded variables become again unnecessary. Only inferences with unshielded, 
i. e., potentially maximal, variables have to be computed. 

Once more, the number of unshielded variables in a clause can be reduced 
if further axioms are available. In particular, in dense total orderings without 
endpoints, unshielded variables can be eliminated completely (Bachmair and 
Ganzinger [3]). 

There are two facts that suggest to investigate the combination of the theory 
of divisible torsion-free abelian groups and the theory of dense total orderings 
without endpoints. On the one hand, the vast majority of applications of divisible 
torsion-free abelian groups (and in particular of the rationals or reals) requires 
also an ordering; so the combined calculus is likely to be much more useful in 
practice than the DTAG-superposition calculus on which it is based. On the 
other hand, these two theories are closely related: An abelian group (G,-|-,0) 
can be equipped with a total ordering that is compatible with -|- if and only 
if it is torsion-free; furthermore divisibility and compatibility of the ordering 
imply that the ordering is dense and has no endpoints. One can thus assume 
that the two calculi fit together rather smoothly. We show in this paper that 
this is in fact true. The resulting calculus splits again into two parts: The first 
one is a base calculus, that works on clauses without unshielded variables, but 
whose rules may produce clauses with unshielded variables. This calculus has 
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the property that saturated sets of clauses are unsatisfiable if and only if they 
contain the empty clause, but it can not be used to effectively saturate a given 
set of clauses. The second part of the calculus is a variable elimination algorithm 
that makes it possible to get rid of unshielded variables, and thus renders the 
base calculus effective. The integration of these two components happens in 
essentially the same way as in the equational case (Waldmann [10]). 

In this extended abstract, we can only sketch the main ideas of our work. 
The reader is referred to the full version [11] for the proofs. 

2 The Base Calculus 

2.1 Preliminaries 

We work in a many-sorted framework and assume that the function symbol + 
is declared on a sort G. If t is a term of sort G and n G N, then nt is an 
abbreviation for the n-fold sum t + ■ ■ ■ + t; in particular. Of = 0 and It = t. 
Analogously, X)iG{i n} h is an abbreviation for the sum ti + • • • + t„. 

Without loss of generality we assume that the equality relation « and the 
semantic ordering > are the only predicates of our language. Hence a literal is 
either an equation t « t', or a negated equation t ^ t' , where t and t' have the 
same sort, or an inequation t > t', or a negated inequation t ^ t' , where t and 
t' have sort G. Occasionally we write t' < t instead of t > t' . The symbol ^ 
denotes either > or <, the symbol ~ stands for ^ or «, and ~ denotes ^ or « 
or 9^. The equality symbol is supposed to be symmetric. Multiple occurrences of 
one of the symbols or ~ within a single inference rule denote consistently 

the same relation. A clause is a finite multiset of literals, usually written as a 
disjunction. 

The clauses 



plus the equality axioms^ 
abelian groups. 



{x + y) + z ^ X + {y + z) 
X + y pz y + X 
X + Q pz X 

{—x) + a; « 0 
n divided-hyn{x) « x 
oo 76 0 
X 9 ^ a; 

x:^y\/y:^z\/x>z 
x ^ y V x + z > y + z 
x>y\/y>x\/xpzy 

are the axioms ODAG ( 



(Associativity (A)) 
(Commutativity (C)) 
(Identity (U)) 
(Inverse (Inv)) 
(Divisibility (Div)) 
(Non- Triviality (Nt)) 
(Irreflexivity (Ir)) 
(Transitivity (Tr)) 
(Monotonicity (Mon)) 
(Totality (Tot)) 

totally ordered divisible 



^ including the congruence axiom x^yVy^zVx^z for the predicate >. 
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The following clauses are consequences of these axioms (for every ip G N^°): 



X + z ^ y + z V X ^ y 
ipx 9^ ipy y X pz y 
x + z:^y + zy X > y 
Ipx 9^ ipy y X > y 



(Cancellation (K)) 
(Torsion-Freeness (T)) 
(>-Cancellation (K>)) 
( >-Torsion-Freeness (T ^ ) ) 



We write OTfCAM for the union of the clauses A, C, U, Ir, Tr, Mon, K, T, 
K>, and the equality axioms. 

We denote the entailment relation modulo ODAG by |=odag> and the entail- 
ment relation modulo OTfCAM by ^OTfCAM- That is, {Ci, . . . , C„} ^odag Cq 
if and only if {Ci , CnlUODAG ^ Co, and {Ci, . . . , C„} hoTfGAM Co if and 
only if {Cl, . . . , C„} U OTfCAM ^ Co- 

A function symbol is called free, if it is different from 0 and +. A term is 
called atomic, if it is not a variable and its top symbol is different from +. We 
say that a term t occurs at the top of s, if there is a position o G pos(s) such 
that s|o = t and for every proper prefix o' of o, s(o') equals +; the term t occurs 
in s below a free function symbol, if there is an o G pos(s) such that s|o = t 
and s(o') is a free function symbol for some proper prefix o' of o. A variable x 
is called shielded in a clause C, if it occurs at least once below a free function 
symbol in C, or if it does not have sort G. Otherwise, x is called unshielded. 

A clause C is called fully abstracted, if no non- variable term of sort G occurs 
below a free function symbol in C . Every clause C can be transformed into an 
equivalent fully abstracted clause abs(C) by iterated rewriting 

C[/(...,t, ...)] xi^ty C[f{...,x,...)], 



where cc is a new variable and t is a non-variable term of sort G occurring 
immediately below the free function symbol / in C. 

We say that an ACU-compatible ordering has the multiset property, if 
whenever a ground atomic term u is greater than Vi for every i in a finite non- 
empty index set I, then u >- Every reduction ordering over terms not 

containing -|- that is total on ground terms and for which 0 is minimal can be 
extended to an ordering that is ACU-compatible and has the multiset property 
(Waldmann [9]).^ 

From now on we will work only with ACU-congruence classes, rather than 
with terms. So all terms, equations, substitutions, inference rules, etc., are to 
be taken modulo ACU, i.e., as representatives of their congruence classes. The 
symbol will always denote an ACU-compatible ordering that has the multiset 
property, is total on ground ACU-congruence classes, and satisfies t s[t]o for 
every term s[t]o. 

Let A be a ground literal. Then the largest atomic term occurring on either 
side of A is denoted by mt(A). 

^ In fact, we use the extended ordering only as a theoretical device; as we work with 
fully abstracted clauses, the original reduction ordering is sufficient for actual com- 
putations. 
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The ordering on literals compares lexicographically first the maximal 
atomic terms of the literals, then the polarities (negative positive), then the 
kinds of the literals (inequation equation), then the number of the sides of 
the literals on which the maximal atomic term occurs, then the multisets of 
all non-zero terms occurring at the top of the literals, and finally the multisets 
(for equations [-■] s « t) or {{s, s},{t}} (for inequations [-■] s > t). 
The ordering )^c on clauses is the multiset extension of the literal ordering 
Both )^L and )^c are noetherian and total on ground literals/clauses. 

2.2 Superposition and Chaining 

We present the ground versions of the inference rules of the base calculus OCInf. 
The non-ground versions can be obtained by lifting in a rather straightforward 
way (see below). 

Let us start the presentation of the inference rules with a few general conven- 
tions: Every term occurring in a sum is assumed to have sort G. The letters u and 
V, possibly with indices, denote atomic terms, unless explicitly said otherwise. 
In an expression like mu -I- s, m is a natural number, s may be zero. 

If an inference involves a literal, then it must be maximal in the respective 
clause (except for the last but one literal in factoring inferences). A positive 
literal that is involved in a superposition or chaining inference must be strictly 
maximal in the respective clause. In all superposition or chaining inferences, the 
left premise is smaller than the right premise. 

C' V mu -I- s ~ m'u + s' 

C' V (to — m')u + s s' 
if TO > to' > 1, M s, M s'. 

C' \/ u^ u 
C' 

if u either equals 0 or does not have sort G. 

C" V 0 > 0 

D' V nu + t~ t' C' V mu -I- s ~ s' 

D' \/ G' \/ ns + mt' ~ ns' + mt 
if n > 1, m > 1, u >- s, u >- s' , u t, u >- t'.^ 

D' \/ t' ^ nu + t C" V mu + s ^ s' 

D' V G' V ns + mt' ^ ns' + mt 
a n > 1, m > 1, u s, u s' , u >- t, u >- t' . 

® If gcd(m, n) > 1, then the conclusion of this inference can be simplified to D' V 
C' y %l)s + xt' ~ V’s' + yt, where V’ = n/gcd(m,n) and y = m/gcd(m, n) (and 
similarly for the following inference rules). To enhance readability, we leave out this 
optimization in the sequel. 



Cancellation 



Equality Resolution 



Inequality Resolution 
Cane. Superposition 



Cane. Chaining 
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Std. Superposition 



Cane. Eq. Factoring 



D' \/ Ufa u' C" V s[u] ~ s' 

D' W C' s[m'] ~ s' 

if u occurs in a maximal atomic subterm of s 
and does not have sort G, u y u' , s[u] y s' . 

C' V nu + t « t' V mu + s fa s' 

C' \I mt + ns' f^ mt' + ns V nu + tfat' 
ii n > 1, m > 1, u s, u s' , u t, u t' . 



Cane. Ineq. Factoring (I) 



C' y nu + t^t' y mu + s ^ s' 

C' V mt + ns' ^ mt' + ns V mu + s ^ s' 
ii n > 1 , m > 1 , u s, u >- s' T u t, u >- t' . 



Cane. Ineq. Factoring (II) 



C' y nu + t^t' y mu + s ^ s' 

C" V mt' + ns ^mt + ns' V nu + t^t' 
a n > 1 , m > 1 , u >- s, u >- s' , u t, u t' . 



Std. Eq. Factoring 



C' y Ufa v' y Ufa u' 

C' y u' f^ v' y u fa v' 

if u, u' and v' do not have sort G, u y u',u >- v'. 



The inference rules of the calculus OCInf do not handle negative inequality 
literals. We assume that in the beginning of the saturation process every literal 
s t in an input clause is replaced by the two literals t > s V t fa s, which are 
equivalent to s t by the totality, transitivity and irreflexivity axioms. Note that 
the inference rules of OCInf do not produce any new negative inequality literals. 

Example 1. Let the ordering on constant symbols be given hy b y c y d. We 
will use the inference rules of OCInf to show that the following three clauses are 
contradictory with respect to ODAG. (The maximal parts of every clause are 
underlined.) 





36 > 


2d 


(1) 




6> 


2c 


(2) 




2bfa 


c + d 


(3) 


Cancellative superposition of 


(3) and (1) 


yields 






3c + 3d > 


4d 


(4) 


Cancellative superposition of 


(3) and (2) 


yields 






c + d > 


4c 


(5) 



By cancellation of (5) we obtain 

d > 3c 



( 6 ) 
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Cancellative chaining of (6) and (4) produces 

Ad>id (7) 

which yields the empty clause by cancellation and inequality resolution. 

In the standard superposition calculus, lifting means replacing equality in 
the ground inference by unifiability. As long as all variables in our clauses are 
shielded, the situation is similar here: For instance, in the second premise C V Ai 
of a cancellative superposition inference the maximal literal A\ need no longer 
have the form mu + s ~ s' with a unique maximal atomic term u. Rather, it 
may contain several (distinct but ACU-unifiable) maximal atomic terms Uk with 
multiplicities m^, where k ranges over some finite non-empty index set K. We 
obtain thus Ai = '"^kUk + s ~ s'. In the inference rule, the substitution 

cr that unifies all Uk (and the corresponding terms vi from the other premise) is 
applied to the conclusion. Consequently, the cancellative superposition rule has 
now the following form: 

D' V + t~t' C y J2k(^K T^kUk -f s ~ s' 

{D' y C y ns + mt' ~ ns' + mt)a 



where 

(i) m = Y.kdK ruk>l,n = Y^iahUi > 1- 

(ii) cr is a most general ACU-unifier of all Uk and vi {k G K, I G L). 

(iii) u is one of the Uk {k G K). 

(iv) ua scr, ua s'a, ua ta, ua t'a. 

The other inference rules can be lifted in a similar way, again under the 
condition that all variables in the clauses are shielded. As usual, the standard 
superposition rule is equipped with the additional restriction that the subterm 
of s that is replaced during the inference is not a variable. For clauses with 
unshielded variables, lifting would be significantly more complicated; however, 
as we will combine the base calculus with an algorithm that eliminates unshielded 
variables, we need not consider this case. 

Theorem 2. The inference rules of the calculus OCInf are sound with respect 
to |=ODAG- 



Definition 3. Let be a set of clauses, let N be the set of ground instances 
of clauses in N. An inference is called OCRed -redundant with respect to N if 
for each of its ground instances with conclusion Cq9 and maximal premise CO 
we have {D G N \ D CO} ^oTfCAM CqO. A clause C is called OGRed- 
redundant with respect to N , if for every ground instance CO, {D G N \ D 
Ct)} hoTfCAM CO. 
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2.3 Rewriting on Equations 

To prove that the inference system described so far is refutationally complete we 
have to show that every saturated clause set that does not contain the empty 
clause has a model. The traditional approach to construct such a model is 
rewrite-based: First an ordering is imposed on the set of all ground instances 
of clauses in the set. Starting with an empty interpretation all such instances 
are inspected in ascending order. If a reductive clause is false and irreducible in 
the partial interpretation constructed so far, its maximal positive literal is turned 
into a rewrite rule and added to the interpretation. If the original clause set is 
saturated and does not contain the empty clause, then the final interpretation 
is a model of all ground instances, and thus of the original clause set (Bachmair 
and Ganzinger [2]). 

In order to be able to treat cancellative superposition we have modified this 
scheme in [4] in such a way that the rewrite relation operates on equations rather 
than on terms. But if we also have to deal with inequations, a further extension is 
necessary: We need to be able to rewrite inequations with inequations; and unlike 
rewriting with equations, this does of course not produce logically equivalent 
formulae. 

Definition 4. A ground equation or inequation e is called a cancellative rewrite 
rule with respect to )^, if mt(e) does not occur on both sides of e. 

We will usually drop the attributes “cancellative” and “with respect to 
speaking simply of “rewrite rules” . 

Every rewrite rule has either the form mu -I- s ~ s', where u is an atomic 
term, m € u s, and u >-■ s', or the form u « s', where m s' and u 

(and thus s') does not have sort G. This is an easy consequence of the multiset 
property of 

Definition 5. Given a set R of rewrite rules, the four binary relations 
~^s,R, ~^o,R, and — >■„ on ground equations and inequations are defined (modulo 
AGU) as follows:'^ 

(i) mu + t^t' — s' + t ^ t' + s, 
if mu -b s « s' is a rule in R. 

(ii) t[s] ~ t' ~^s,R t[s'] ~ t', 

if (i) s « s' is a rule in R and (ii) s does not have sort G or s occurs in t 
below some free function symbol. 

(iii) mu + t^t' ~^o,R s' + t^t' + s, 
if mu -b s ^ s' is a rule in R. 

(iv) u + t^u + t' — t ~ t', 
use u — 0 « 0, 

if u is atomic and different from 0. 

While we have the restriction u >~ s, u s' for the rewrite rules, there is no such 

restriction for the (in-)equations to which rules are applied. 




234 



U. Waldmann 



The union of —>-7 , a, ~^s,r, ~^o,r, and — is denoted by — 

If e — e' using a 7-, 6 - or K-step, then e and e' are equivalent modulo 
OTfCAM and the applied rewrite rule. If s ^ s' ~^o,r t ^ t', then both t ^t' 
and t « t' imply s ^ s' modulo OTfCAM and the applied rewrite rule. 

We say that an (in-)equation e is 7-reducible, if e — >-7 e' (analogously for S, 
o, and k). It is called reducible, if it is 7-, 5 -, o-, or K-reducible. 

Unlike o- and K-reducibility, 7- and ( 5 -reducibility can be extended to terms: 
A term t is called 7-reducible, if t « t' e', where the rewrite step takes 
place at the left-hand side (analogously for d). It is called reducible, if it is 7- or 
^-reducible. 

Lemma 1. The relation is contained in and thus noetherian. 



Definition 6. Given a set R of rewrite rules, the relation — is defined by 
~^°R = ° ~^o,R o 

Definition 7. Given a set R of rewrite rules, the truth set tr(i?) of R is the set of 
all equations s « s' for which there exists a derivation s«s' — 0«0, and the 
set of all inequations s ^ s' for which there exists a derivation s ^ s' 0 ^ 0. 
The 'f'-truth set ivq,{R) of R is the set of all equations or inequations e = s ^ s', 
such that either e G tr(i?) and s does not have sort G, or -tps ~ 'ijjs' G tr(i?) for 
some f/' G 

All (in-)equations in tr^(i?) are logical consequences of the rewrite rules in R 
and the theory axioms OTfCAM. 

2.4 Model Construction 

Definition 8. A ground clause G' V e is called reductive for e, if e is a can- 
cellative rewrite rule and strictly maximal in G' V e. 



Definition 9. Let iV be a set of (possibly non-ground) clauses that does not 
contain the empty clause, and let N the set of all ground instances of clauses 
in N. Using induction on the clause ordering we define sets of rules Rc, Rq, 
Ec, and Eq , for all clauses C £ N. Let C be such a clause and assume that Rd, 
Rf), Ed, and E^ have already been defined for all D G such that C >c D. 
Then the set Rc of primary rules and the set Rq of secondary rules are given 

by 



Rc= Ed and R$ = |J E'^ . 

D^cC D^cC 

® As we deal only with ground terms and as there are no non-trivial contexts around 
(in-)equations, this operation does indeed satisfy the definition of a rewrite relation, 
albeit in an unorthodox way. 
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Ec is the singleton set {e}, if C is a clause C" V e such that (i) C is reductive 
for e, (ii) C is false in tr(i?^), (iii) C is false in U {e}), and (iv) xmt(e) 

is 7(5-irreducible with respect to Rq for every x G Otherwise, Ec is empty. 

If Ec = {e}, then Eq is the set of all rewrite rules e' G tr^(i?^ U Ec) such 
that mt(e') = mt(e) and e' is ^^-irreducible with respect to . Otherwise, E^ 
is empty. 

Finally, the sets Rod and R^ are defined by 

i?oo = U ^D- 

D&N D&N 

Our goal is to show that, if N is saturated with respect to OCInf, then 
tr(i?,^) is a model of the axioms of totally ordered divisible abelian groups and 
of the clauses in N. 

2.5 Refutational Completeness of OCInf 

The relations and are in general not confluent, not even in the purely 

equational case. One can merely show that that is confluent on equations 
in tr(i?^), that is, that any two derivations starting from an equation e can 
be joined, provided that there is a derivation e 0 « 0. But even this kind 
of restricted confluence does not hold for inequations, and in particular, not 
for o-rewriting. We can only prove that two derivations starting from the same 
inequation can be joined, if one of them leads to 0 > 0 and if the other one does 
not use o-steps. This property will be sufficient for our purposes, however. 

Definition 10. Let if be a set of equations and/or inequations. We say that 
the relation — is partially confluent on E, if for all equations Cq € E and Ci, 62 
with Cl Co — 62 there exists an equation 63 such that ci — 63 62, 

and if for all inequations b'q £ E and e[ with e\ ^*,5^ ^ Cg — >-Jj. 0 > 0 there is a 
derivation e'^ — 0 > 0. 

There is one important technical difference between the equational case de- 
veloped in (Waldmann [8]) and the inequational case that we consider here: In 
the equational case, one can show that is confluent on tr(i?j^), and hence 

that tr(i?,^) is a model of the theory axioms, without requiring that the set N of 
clauses is saturated. Saturation is only necessary to prove that tr(i?,^) is also a 
model of N . In the inequational case, such a separation does not work: Proving 
partial confluence of is only possible if we require that cancellative chain- 

ing inferences are redundant. For this reason, the proof that tr{R^) is partially 
confluent and the proof that tr(i?,^) is a model of N must be combined within 
a single induction. 

The following two lemmas are copied almost verbatim from (Waldmann [8]). 

Lemma 2. The relation is confluent on the equations in tr{Rc) for every 

C £ N. The relation -^rH' is confluent on the equations in tr{R^). 
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Corollary 11. For every C G N, tr(i?^) and tr{R^) satisfy ACUKT and the 
equality axioms (except the congruence axiom for the predicate >). 

In a similar way as Lemma 2, we obtain by a rather tedious case analysis 
over various kinds of critical pairs: 

Lemma 3. If for every pair of rules mu + s > s' and nu + t < t' from Eq U Rq 
the inequation ns + mt' > ns' + mt is contained in tr(i?^), then 
partially confluent on tr(i?^ U Eq). 

Using the same techniques as in (Waldmann [8]) and (Bachmair and Ganzin- 
ger [3]) we can now prove the following theorem. Note in particular that in the 
presence of the totality axiom cancellative inequality factoring (I) /(II) inferences 
are simplifications, hence clauses where the maximal atomic term occurs on the 
same side of two ordering literals do not produce primary rules. 

Theorem 12. Let N be a set of clauses without negative inequality literals and 
without unshielded variables; suppose that N is saturated up to redundancy and 
contains the theory axiom Div, Inv, Nt, and all ground instances of Tot. If all 
clauses of N, except the ground instances of Tot, are fully abstracted, and if N 
does not contain the empty clause, then we have for every ground clause CO G N: 
(i) Ece = 0 if and only if CO is true in tr(RQg). 

(a) CO is true in tr(R^) and in tr(R^) for every D CO. 

(in) The relation is partially confluent on tr(i?pg) and the relation 

R/gUE/g Partially confluent on tr(i?^g U E^g). 

(iv) tr(RQg) and tr(i?^g U E^g) satisfy the axioms Ir, Tr, Mon, K>, T>, and 
the congruence axiom for the predicate >. 

(v) The relation is partially confluent on tr{R/() and tr(R^) satisfies 

the axioms Ir, Tr, Mon, K^, T^, and the congruence axiom for the predicate >. 



Theorem 13. Let N be a set of clauses without negative inequality literals and 
without un, shielded variables; .suppose that N is saturated up to redundancy and 
contains the theory axiom Div, Inv, Nt, and all ground instances of Tot. Suppose 
that all clauses of N, except the ground instances of Tot, are fully abstracted. 
Then N U ODAG is unsatisBable if and only if N contains the empty clause. 

We may assume without loss of generality that the constant ag does not occur 
in non-theory input clauses and that the function symbols — and divided-byn are 
eliminated eagerly from all non-theory input clauses. In this case, no inferences 
are possible with the axioms Div, Inv, and Nt. Furthermore, one can show that 
inferences with the totality axiom Tot are always redundant (analogously to 
Bachmair and Ganzinger [3]). 
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3 The Extended Calculus 

3.1 Variable Elimination 

As we have mentioned in the introduction, the calculus OCInf works on clauses 
without unshielded variables, but its inference rules may produce clauses with 
unshielded variables. To make it effectively saturate a given set of clauses, it has 
to be supplemented by a variable elimination algorithm. 

In the equational case, every clause with unshielded variables can be trans- 
formed into an equivalent clause without unshielded variables. However, in the 
presence of ordering literals, this does no longer hold. 

Example 14. Consider the clause C = x>a\/x~h\/x<c. This clause is 
true for every value of x, if either c > a or both a ~ b and c « 6 . So (7 can be 
replaced by the clause normal form of c>aV {a ~ b t\ c~b), that is, by the 
two clauses c > a V a fv b and c > a V cfvb, but C is not equivalent to a single 
clause without unshielded variables. 

For any disjunction of conjunctions of literals F let CNF(F) be the clause 
normal form of F (represented as a multiset of clauses). 

Let a; be a variable of sort G. We define a binary relation over multisets 
of clauses by 

CancelVar M U {C V mx -I- s ~ m! x + s'} 

M U {C V {m—m')x -b s ~ s'} 
if m > m' > 1 . 

ElimNeg M U {C V mx -b s 9 ^ s'} -^x 
MU{C'} 

if m > 1 and x does not occur in C' , s, s'. 

ElimPos M U {C" V Vie/ + D ~ V Vje J ^ 

M U CNF(C" V VjeJ \/keKi''^kSj + mjt'^ > Uks'j + mjtk 

V Mi^jiliSj + mjv'i « k-s'j + mjn A htk + Ukr'^ « + Ukn))) 

if I U J U K ^ li > 1, mj > 1, Uk > I and x does not occur in 
C', r„ r', Sj,s'j,tk, t'fc, for i G I, j G J, k £ K. 

Coalesce M U {C' V mx -b s 76 s' V nx + t t'} -Gx 

M U {C V mx -b s 76 s' V mt + ns' ~ mt' + ns} 
if m > 1 , n > 1 , and x does not occur in s, s', t, t' . 

It is easy to show that -Gx is noetherian. We define the relation — over 
multisets of clauses in such a way that M U {C} — >-eiim M U M' if and only if C 
contains an unshielded variable x and M' is a normal form of {(7} with respect 
to -Gx- 

The relation — >-eiim is again noetherian. For a clause C, elim((7) denotes some 
(arbitrary but fixed) normal form of {C} with respect to the relation — >-eiim- 
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Corollary 15. For any C, the clauses in elim(C') contain no unshielded vari- 
ables. 



Lemma 4. For every C, {C} ^odag elim(C) and elim(C') U Tot |=OTfCAM C. 
For every ground instance CO, elim(C')0 U Tot ^OTfCAM CO. 



3.2 Integration of the Elimination Algorithm 

Using the technique sketched so far, every clause Co can be transformed into 
a set of clauses elim(Co) that do not contain unshielded variables, follow from 
Co and the axioms of totally ordered divisible abelian groups, and imply Co 
modulo OTfCAM U Tot. Obviously, we can perform this transformation for all 
initially given clauses before we start the saturation process. However, when 
clauses with unshielded variables are produced during the saturation process, 
then logical equivalence is not sufficient to eliminate them. We have to require 
that the transformed set of clauses elim(Co) makes the inference l producing 
Co redundant. Unfortunately, it may happen that the clauses in elim(Co) or 
the instances of the totality axiom needed in Lemma 4 are too large, at least 
for some instances of t. To integrate the variable elimination algorithm into the 
base calculus, it has to be supplemented by a case analysis technique. 

Let k £ {1, 2}, let Cl, . . . , Cfe be clauses without unshielded variables and let 
6 be an OC Inf -inference 



Cfc ... Cl 

Coa 

We call the unifying substitution a that is computed during i. and applied to 
the conclusion the pivotal substitution of t. (For ground inferences, the pivotal 
substitution is the identity mapping.) If the last premise Ci has the form C( V A 
where A is maximal (and the replacement or cancellation takes place at A) then 
we call Aa the pivotal literal of l. Finally, if ug is the atomic term that is cancelled 
out in i, or in which some subterm is replaced,® then we call ugcr the pivotal 
term of t. 

Two properties of pivotal terms are important for us: First, whenever an 
inference i from clauses without unshielded variables produces a conclusion with 
unshielded variables, then all these unshielded variables occur in the pivotal term 
of i. Second, no atomic term in the conclusion of l can be larger than the pivotal 
term of t. 

One can now show that, if the clauses in elim(C'o) or the instances of the 
totality axiom needed in Lemma 4 are too large to make the OCInf-inierence l 
redundant, then there must be an atomic term in some clause in elim(C'o) that 
is unifiable with the pivotal term. If we apply the unifier to the conclusion of the 
OCInf-inierence, then the result does no longer contain unshielded variables, and 



More precisely, ug is the maximal atomic subterm of s containing u in standard 
superposition inferences, and the term u in all other inferences. 
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moreover it subsumes the critical instances of i. Using this result, we can now 
transform the inference system OCInf into a new inference system that operates 
on clauses without unshielded variables and produces again such clauses. The 
new system ODInf is given by two meta-inference rules: 



Eliminating Inference 



Cn ... Cl 

C 



if the following conditions are satisfied: 



(i) 

(ii) 



C„ ... Cl 



Co 

C G elim(Co). 



is a OC/n/-inference. 



Instantiating Inference 



Cn ... Cl 
Cor 

if the following conditions are satisfied: 

(j) ^ ig a OCInf -inference 

Co 

term u. 

(ii) elim(Co) yf {Co}. 

(iii) A literal A\ with the same polarity as 

(iv) An atomic term u\ occurs at the top 

(v) T is contained in a minimal complete 

We define the redundancy criterion for 
way, that an ODInf -inference is redundant, if the appropriate instances of its 
parent OCInf -inference are redundant. Then a set of clauses without unshielded 
variables that is saturated with respect to ODInf up to redundancy is also 
saturated with respect to OCInf up to redundancy. ODInf can thus be used for 
effective saturation of a given set of input clauses: 

Theorem 16. Let Nq be a set of clauses without negative inequality literals 
and without unshielded variables; let Nq contain the theory axiom Div, Inv, 
Nt, and all ground instances of Tot. Suppose that all clauses of Nq, except the 
ground instances of Tot, are fully abstracted. Let Nq \- Ni \- N2 . be a fair 
ODInf -derivation. Let N^o be the limit of the derivation. Then Nq U ODAG is 
unsatishable if and only if Nao contains the empty clause. 



with pivotal literal A and pivotal 



A occurs in some clause in elim(Co). 
of Ai. 

set of ACU-unifiers of u and Ui. 
the new inference system in such a 
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4 Conclusions 

We have presented a superposition-based calculus for first-order theorem proving 
in the presence of the axioms of totally ordered divisible abelian groups. It 
is based on the DTAG-superposition calculus from (Waldmann [10]) and the 
ordered chaining calculus for dense total orderings without endpoints (Bachmair 
and Ganzinger [3]), and it shares the essential features of these two calculi: It is 
refutationally complete, it does not require explicit inferences with the theory 
clauses, and due to the integrated variable elimination algorithm it does not 
require variable overlaps. It offers thus an efficient way of treating equalities and 
inequalities between additive terms over, e.g., the rational numbers within a 
first-order theorem prover. 



Acknowledgments. I would like to thank the anonymous IJCAR referees for 
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Abstract. Indexing data structures have a crucial impact on the per- 
formance of automated theorem provers. Examples are discrimination 
trees, which are like tries where terms are seen as strings and common 
prefixes are shared, and substitution trees, where terms keep their tree 
strncture and all common contexts can be shared. Here we describe a 
new indexing data structure, called context trees, where, by means of a 
limited kind of context variables, also common subterms can be shared, 
even if they occur below different function symbols. Apart from intro- 
ducing the concept, we also provide evidence for its practical value. We 
describe an implementation of context trees based on Gnrry terms and 
on an extension of substitution trees with equality constraints, where 
one also does not distinguish between internal and external variables. 
Experiments with matching benchmarks show that our preliminary im- 
plementation is already competitive with tightly coded current state- 
of-the-art implementations of the other main techniques. In particular 
space consumption of context trees is significantly less than for other 
index structures. 



1 Introduction 

Indexing data structures have a crucial impact on the performance of theorem 
provers. The indexes have to store a large number of terms and to support the 
fast retrieval, for any given query term t, of all terms in the index satisfying 
a certain relation with t, such as matching, unifiability, or syntactic equality. 
Indexing for matching, where, to check for forward redundancy, one searches 
in the index for a generalization of the query term, is well-known to be the 
most limiting bottleneck in practice. Another aspect which is becoming more 
and more crucial is memory consumption. During the last years processor speed 
has been growing much faster than memory capacity and one may assume that 
this gap will become even wider in the coming years. At the same time memory 
access bandwidth is also becoming an important bottleneck. Excessive memory 
consumption leads to more cache faults, which become the dominant factor for 

* The second and third author are partially supported by the Spanish GIGYT project 
HEMOSS ref. TIG98-0949-C02-01. All test programs, implementations and bench- 
marks mentioned in this paper are available at www.lsi.upc.es/~roberto. 
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time, instead of processor speed. Therefore, in what follows we will mainly focus 
on matching retrieval operations and on memory consumption. 

One important aspect makes indexing techniques in theorem proving essen- 
tially different from indexing in other contexts like functional or logic program- 
ming: the index is subject to insertions and deletions. Therefore, during the last 
two decades a significant number of results on new specific indexing techniques 
for theorem proving have been published and applied in different provers. The 
currently best-known and most frequently used indexing techniques for match- 
ing are discrimination trees [1,4], the compiled variant of discrimination trees, 
called code trees [9], and substitution trees [2]. 

Discrimination trees are like tries where terms are viewed as strings and 
where common prefixes are shared. A substitution tree has, in each node, a 
substitution, a list of pairs Xi t where each Xi is an internal variable and t 
is a term that may contain other internal variables as well as external variables 
which are the variables in the terms to be stored. 

Example 1. The two terms f{a,g{x), h{y)) and f{h{b),g{y), h{y)) will be stored 
in a substitution tree and discrimination tree, respectively, as shown: 



substitution tree: discrimination tree: 

a;o = f{xi,g{x2),h{y)) 

^ \ 

xi = a, X 2 = X xi = h{b), X 2 = y 

In a substitution tree all terms xga are stored such that a is the composition 
of the substitutions on some path from the root to a leaf of the tree. In the 
example, after inserting the first term in an empty substitution tree we obtain 
the single node xq = f{a,g{x),h{y)). When inserting the second term, internal 
variables are placed at the points of disagreement, and children are created with 
the “remaining” substitutions of both. Therefore all common contexts can be 
shared. □ 



/ 

a h 

i i 
9 b 
i i 



Example 2. It clear that the additional sharing in substitution trees avoids re- 
peated work (which is the main goal of all indexing techniques). Assume one has 
two terms /(c, a;, t) and /(x, c, t) in the index, and a query /(c, c, s), where s and 
t are terms such that s is not an instance of t. Then two attempts to match s 
against t will be made in a discrimination tree, and only one in a substitution 
tree. But, on the other hand, in substitution trees the basic traversal algorithms 
are significantly more costly. □ 

Here we describe a new indexing data structure, called context trees, where, 
by means of a limited kind of context variables, certain common subterms can 
be shared, even if they occur below different function symbols. Roughly, the 
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idea is that f{s) and g{s,t) can be represented as F{s,t), with children F = f 
and F = g. Function variables such as F stand for single function symbols only 
(although extensions to allow for more complex forms of second-order terms are 
possible) . 

Example 3. Assume one has three terms h{x,f{t)), h{x,g{t)), and h{b,f{t)) in 
the index. Then, in a discrimination tree, t will occur three times. In a substi- 
tution tree, we will have: 

xO = h{xl, x2) 



xl = X xl = b, x2 = f{t) 



x2 = f{t) x2 = g{t) 

and with a query h{b,f{s)), the terms s and t will be matched twice against 
each other (at the leftmost and rightmost leaves). In a context tree, the term t 
occurs only once: 

xO = h{xl, F{t)) 



xl = X xl = b, F = f 




F=f F=g 

and if s does not match t, the failure with the query h{b, /(s)) will be found at 
the root. □ 

In addition to proposing the concept of context trees, in this paper we will 
also provide some evidence for its practical value. First, we show how they can 
be adequately implemented. In order to be able to reuse some of the main ideas 
for efficient implementation of substitution trees, we will consider terms built 
from a single pairing constructor and constants. These terms will also be called 
Curry terms. We describe an implementation based on these Curry terms and 
an extension of substitution trees by equality constraints and by not distinguish- 
ing internal and external variables. The second evidence for its practical value 
is empirical. Experiments with matching show that our preliminary implemen- 
tation (which does not yet include several important enhancements) is already 
competitive with tightly coded state-of-the-art implementations, namely the im- 
plementation of discrimination trees of the Waldmeister prover [3] and the code 
trees of the Vampire prover [9]. 

For the experiments, we adopted the methods for evaluation of indexing tech- 
niques described in [5] : (i) we use 30 very large benchmarks containing the exact 
sequence of (update and retrieval) operations on the matching index that take 
place when running three well-known state-of-the-art provers on a selected set of 
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10 problems; (ii) comparisons are made with the discrimination tree implemen- 
tation of the Waldmeister prover [3], and the code trees of the Vampire prover 
[9] , as provided by their own implementors using the test driver of [5] . 

This paper is structured as follows. Section 2 introduces some basic con- 
cepts of indexing, discrimination trees and substitution trees. In Section 3 we 
outline some problems with direct implementations of context trees and explain 
how one can use Curry terms to solve theses problems. We also show that the 
use of Curry terms has several additional advantages. In Section 4 we describe 
our implementation in a certain detail. Finally, Sections 5 and 6 describe the 
experimental results and some promising directions for future work. 



2 Discrimination Trees and Substitution Trees 

Discrimination trees can be made very efficient if query terms are linear (as the 
terms in the trees are). Usually, queries are the so-called flatterms of [1], which 
are linked lists with additional pointers to jump over subterms t when a variable 
of the index gets instantiated by t. 

In standard discrimination trees, all variables are represented by a single 
variable symbol *, so that different terms such as f{x,y) and f{x,x) are both 
represented by /(*, *), and the corresponding path in the tree is common to both. 
This increases the amount of sharing, and also the retrieval speed, because the 
low-level operations (basically symbol comparison and variable instantiation) 
are very simple. But it is only a prefilter: once a possible match has been found, 
additional equality tests have to be performed between the query subterms by 
which the variables of terms like f{x, x) have been instantiated. Nodes are usually 
arrays of pointers indexed by the function symbols, plus one additional pointer 
for *. If, during matching, the query symbol currently treated is /, then one can 
directly jump to the child for /, if it exists, or to the one of *. Especially for 
larger signatures, this kind of nodes lead to high memory consumption. Note 
that the case where children for both / and * exist is the only situation where 
backtracking points are created. 

In perfect discrimination trees, variables are not collapsed into a single sym- 
bol. Instead, nodes of different sizes exist: apart form the function symbols, each 
node can have a child for any of the variables that already occurred along the 
path in the tree, plus an additional child for a possible new variable. Hence even 
more memory is needed in this approach. Also there is less sharing in the index. 
On the other hand, the equality tests are not delayed (which is good according 
to the first-fail principle; see also below), all matches found are correct and no 
later equality tests are needed. The Waldmeister prover [3] uses these perfect 
discrimination trees for matching. 



2.1 Implementation Techniques for Substitution Trees 

Let us now consider substitution trees in more detail. They were introduced by 
Peter Graf [2], who also developed an implementation that is still used in the 
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Spass prover [10]. (A more efficient implementation was given in the context of 
the Dedam (Deduction abstract machine) kernel of data structures [6], and has 
served as a basis for our implementation of context trees as well.) 

As for discrimination trees, it is important to deal with an adequate rep- 
resentation of query terms. In Dedam, Prolog- like terms are used: each term 
f{ti , . . . ,t„) is represented by n -I- 1 contiguous heap cells with a tag and an 
address field: 

a 

(Z -f 1 



a -I- n 

where each address field a* points to the subterm U, and (uninstantiated) vari- 
ables are ref’s pointing to themselves. In this setting, contiguous heap cell blocks 
of different sizes co-exist, and traversal of terms requires controlling arities. Term- 
to-term operations like matching or unification only instantiate self-referencing 
ref positions. If these instantiated positions are pushed on a stack, called the 
refstack, then undoing the operation amounts to restoring the positions in the 
refstack to self-references again. 

Substitutions in substitution trees are always pairs of heap addresses; each 
right hand side points to a term; each left hand side points to an internal variable 
(i.e., a self-ref position) occurring exactly once in some term at the right hand 
side of a substitution along the path to the root. 

The basic idea for all retrieval operations (finding a term, matching, unifica- 
tion) in substitution trees is the same: one instantiates the internal variable xq 
at the root with the query term, and traverses the tree, where at each visited 
node with a substitution xi = ti, . . . , = tn, one performs the basic term-to- 

term operation (syntactic equality, matching, unification) between each (already 
instantiated) Xi and its corresponding ti. The term-to-term operations only dif- 
fer in which variables are allowed to be instantiated, and which variables are 
considered as constants: for finding terms (syntactic equality), only the internal 
ref’s (called intref) can be instantiated; for matching, also the external ref’s 
of the index (but not of the query); for unification, all ref’s can be instantiated. 

Upon failure, backtracking occurs. After successfully visiting a node, before 
continuing with its first child, its next sibling is stored in the backtracking stack, 
together with the current height of the refstack. Therefore, for backtracking, 
one pops the next node to visit from the backtracking stack, together with its 
corresponding refstack height, and restores all ref positions above this height. 
A failure occurs when trying to backtrack on an empty backtracking stack. 

Due to space limitations, we cannot go into details here about the update op- 
erations for substitution trees. Let us only mention that several insertion strate- 
gies are possible (first-fit, best-fit), and that the basic operation for insertion is 
the computation of the common part and remainders of two substitutions. For 
deletions, one sometimes needs to apply the reverse operation, namely to merge 
two substitutions into a single one. 



./ 




ref 


ai 






ref 


an 
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Example 4 - Let ai be the substitution {x\ = g{a,h{b)), X2 = a} and let CT2 
be {x\ = g{b,h{c)), X2 = b}. Their common part is {xi = g{x3,h(x4))}. The 
remainders of both substitutions are {x2 = a, X3 = a, X4 = b} and {x2 = b, X3 = 
b, X4 = c} respectively. □ 

2.2 Substitution Trees for Matching 

In Dedam a special version of substitution trees for matching has been developed, 
which is about three times faster than the general-purpose implementation in 
Spass and Dedam. 

Example 5 . Suppose the query is of the form /(s, t) and consider a substitution 
tree with the two terms f{x,x) and f{a,x): the root is xq = f(xi,x), with 
children xi = x and xi = a. When matching, at the root xi gets instantiated 
with s, and x with t; then, at the leftmost child, the terms s and t are matched 
against each other. Note that one has to keep track of whether or not x has 
already been instantiated, i.e., one has to keep a refstack. □ 

The idea for improving this procedure is similar to the one of the standard 
variant of discrimination trees: external variables are all considered to be dif- 
ferent. But in substitution trees the advantages are more effective: the refstack 
becomes unnecessary (and hence also the information about its height in the 
backtracking stack), because one can always override the values of the internal 
and external variables and restauration becomes unnecessary. Matching opera- 
tions between query subterms, like s and t in the previous example, are replaced 
by a cheaper syntactic equality test of the equality constraints at the leaves. 



3 Context Trees 



We start by illustrating the increased amount of sharing in context trees as 
intuitively described in Section 1 compared with substitution trees. 

Example 6. Assume in a context tree we have a subtree T below a node Xi = 
f{xj,t) (depicted below at the left) where we have to insert Xi = g{s, t, u). Then 
the common part is Xi = F(xj,t,u), the remaining parts are {A = /} and 
{F = g, Xj = s}, respectively, and we obtain: 




/ 

Xi = F{xj,t, u) 

/ \ 

F =^f F = g, Xj = s 
T 



During retrieval on the new tree, the term-to-term operations have to be guided 
by the arities of the query: if Xi is instantiated with a query term headed with 
/, when arriving at the node Xi = F{xj,t, u), then, since the arity of / is 2 , one 
can simply ignore the term u of this common part. □ 
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It is not difficult to see that, with the restricted kind of function variables F 
that stand for single function symbols, the common part of two terms s and t, as 
in substitution trees, will contain the entire common context, and additionally 
also those subterms u that occur at the same position p in both terms, that is, 
for which u = s\p = t\p. 

Example 1. The common part of the two terms f{g{b,b),a,c) and h{h{b,c),d) 
is F(G(6, xi), X2, X3). Indeed, the subterm b at position 1.1 is the only term 
occurring at the same position in both terms. □ 

To implement context trees for matching by an extension of the specialized 
substitution trees for matching requires to deal with the specific properties of 
the context variables. 

Example 8. Consider again Example 6. The term f{xj,t) consists of three con- 
tiguous heap cells. The first contains /, the second is an intref corresponding 
to Xj, and the third is a ref pointing to the subterm t. Initially, in the subtree T 
below that node, along each path to a leaf Xj appears once as a left hand side in a 
substitution. After inserting Xj = g{s,t,u), the common part is Xj = F(xj,t,u), 
and the new term F(xj,t,u) needs four contiguous heap cells instead of three. 

A serious implementation problem now is that, if we allocate a new block of 
size four, all left hand sides pointing to Xj in the subtree T have to be changed 
to point to the new address of Xj. □ 



3.1 Context Trees through Curry Terms 

A simple solution to the previous problem would be to always use blocks corre- 
sponding to the maximal arity of symbols, but this is too expensive in memory 
consumption. Here we propose a different solution, which is conceptually ap- 
pealing and at the same time turns out to be very efficient since it completely 
avoids the need for checking arities. We suggest to represent all terms in Curry 
form. Curry terms are formed with a single binary apply symbol @, and all other 
function symbols are considered (second-order) constants to be treated much like 
their first-order counterparts. This idea is standard in the context of functional 
programming, but, surprisingly, does not seem to have been considered for term 
indexing data structures for automated deduction before. 

Example 9. Consider again the terms of Example 6, where we saw that one 
can share the term t in f{xj,t) and g{s,t,u) by having F{xj,t,u). In Curry 
form, these terms become xj),t) and @{@{@{g, s),t),u) and the t cannot 

be shared. But in the Curry form the same amount of sharing exists: still all 
arguments that are in the same position are shared, assuming that positions 
are counted from right to left. Consider the arguments of the same terms in 
reverse order. The we have f{t,Xj) and g(u,t,s), which in Curry form become 
@{@{f,t),Xj) and @{@{@{g,u),t), s). The common part, which was F{u,t,Xj), 
can be computed on the Curry terms exactly as it was done for common contexts 
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of first-order terms in substitution trees. In the example we get 
where the remaining parts are {xk = /} and {xk = @{g,u), Xj = s}. 

It is not difficult to see that in this way one obtains exactly the same amount 
of sharing as with context variables: all common contexts and all subterms u 
that occur at the same position in both terms (but remember: if positions are 
computed from right to left; for instance, the shared t in Xj) and g{u, t, s) is 
at position 2 in both terms). □ 

An important additional advantage is that the basic algorithms do not de- 
pend on the arities of the symbols anymore. Moreover, since it is obviously not 
necessary to store any apply symbols, all memory blocks contain exactly two 
heap cells. 

Example 10. The term f{b, g{x)) becomes b),@{g, x)), which can be writ- 

ten in pair notation simply as {{f,b), {g,x)). Compare the Prolog format with 
how the Curry term can be stored: 

Prolog term: Curry Term: 



10 f 100 ref — > 120 ref > 150 f 

11 ref — > 40 b 121 ref — > 140 g 151 b 



12 ref > 60 g 141 var 

61 ref — > 80 ref 

Note that in the Prolog term, constants are a block of heap cells on their own 
such as the b at address 40. Alternatively, constants can also be placed directly 
at pointer position for minimizing space. But with Prolog terms this makes the 
algorithms slower as a uniform treatment for all function symbols (constants or 
not) becomes impossible. But in Curry terms, since all function symbols are 
constants, this space optimization can be used without any cost. □ 

In Curry terms each cell is either a constant, a variable var or a ref to a 
subterm. Curry terms are always headed by a single heap cell, and all other 
blocks consist of two contiguous heap cells. This makes the basic algorithms 
very efficient, as exemplified by the following recursive algorithm for testing the 
equality of two (ground) Curry terms: 

int TermEqual (_HeapAddr s, _HeapAddr t)f 
if (HeapTag(s) ! =HeapTag(t) ) return (0) ; 
if (HeapIsRef (s) )f 

if ( !TermEqual(HeapAddr (s) , HeapAddr(t) )) return (0) ; 
if ( !TermEqual(HeapAddr (s) + l jHeapAddr (t)+l) ) return(O);}- 
return (1) ; } 



4 Implementation 

4.1 Equality Constraints 

In order to exploit the idea of equality constraints in its full power, it is important 
to perform the equality tests not only at the leaves, but as high up as possible 
in the tree without decreasing the amount of sharing (see [6]). For example, if 
we have /(a, x, x, x, a), f{b, x, x, x, a), and /(c, y, y, x, a), then the tree can be: 
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= f{xi,x,y,z,a), x = y 




X = Z Xi= c 



x\ = a x\ = b 

Note that placing the equality tests x = y and x = z in the leaves would 
frequently lead to repeated work during retrieval time. Also, according to the 
first-fail principle (which is strongly recommended in indexing techniques), it is 
important to impose strong restrictions like the equality of two whole subterms 
as soon as possible. Below we outline some details about our implementation 
of equality constraints, their evaluation during retrieval time and their creation 
during insertions by means of MF-sets (merge-find sets). 



4.2 Internal vs. External Variables 



We have seen that in our Curry terms we only consider heap cells that are a 
constant, a ref, or a variable var. Indeed, it turns out that the usual distinction 
between internal and external variables can also be dropped. (A variable that 
is not instantiated represents an external variable.) This leads to even more 
sharing in the index and increases matching retrieval speed, however, at the 
price of significantly more complex update operations (see below). 



Example 11 . If the tree contains the two terms f{a,a) and f{x,b), we have: 



distinguishing internal 
and external vars.: 



no distinctions: 



X0 = f{xi,X2) Xo — f{xi,X2) 




Xi = a, X2 = a X\ = x,X2 = b x\ — a,X2 — a X2 = b 

Note that in the second tree the variable Xi plays the role of an internal variable 
in the leftmost branch and of an external variable in the other one. □ 

In this setting, the term-to-term matching operation can be implemented as 
follows: 

int TermMatch(_HeapAddr query, _HeapAddr set){ 

if (HeapIsVar (set) ) { HeapSetAddr (set .query) ; return(l); J 
if (HeapTag(query) ! =HeapTag(set) ) return (0) ; 
if (HeapIsRef (set) ) •[ 

if ( ITermMatch (HeapAddr (query) , HeapAddr (set) )) return(O) ; 
if (! TermMatch(HeapAddr(query)+l .HeapAddr (set)+l) ) return(O);} 
return(l) ; } 
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0 = ((/l)2) 




2 = a 2 = c 2 = a 2 = c 2 = ((/5)((/6)7)), 3 = 5 2 = 6 



1) /(a, a) 

2) f{a,c) 

3) f if (0,0), a) 

4) /(/(0,0),c) 

5) /(/(0,0),/(0,/(0,a))) 

6) /(/(0,0),/(0,/(0,b))) 

7) /(0,6) 



7 = a, 3 = 6 7 = 6 

iifa)a) 

((/o)c) 

((/((/0)0))a) 

((/((/0)0))c) 

((/((/0)0))((/0)((/0)a))) 

((/((/0)0))((/0)((/l)6))) 

((/ 0 )&) 



Fig. 1. Context tree for the terms l)-7) 



4.3 Matching Retrieval 

In Fig. 1 we show a tree as it would have been generated in our implementation 
after inserting the seven terms given both in standard representation and Curry 
form, respectively. Variables are written as numbers. Note that the equality 
constraints 3 = 4 and 3 = 5 are shared among several branches. Given the 
term-to-term operations for equality and matching from above, the remaining 
code needed for matching retrieval on a context tree is very simple. One needs 
a function for checking the substitution of a context tree node during matching: 

int SubstMatch(_Subst subst){ 
while (subst){ 

if (subst->IsEqualityConstraint) 

f if ( ! TermEqual (subst->lhs , subst->rhs) ) return(O) ; I 
else 

{. if ( ! TermMatch(subst->lhs , subst->rhs) ) return(O) ; I 
subst = subst->next ; } 
return (1) ;} 

Finally, the general traversal algorithm of the tree is the one presented below (as- 
suming that the root variable xq has already been instantiated with the query): 

int CTreeMatch(_CTree tree)! 
if ( ! SubstMatch(tree->subst) ) 

if (tree->nextSibling) return(CTreeMatch(tree->nextSibling) ) ; 
else return(O) ; 

if ( ! tree->f irstChild ) return(l) ; 

if ( ! tree->nextSibling) return(CTreeMatch(tree->f irstChild) ) ; 
return(CTreeMatch(tree->nextSibling) I I CTreeMatch(tree->f irstChild) ) ;]■ 
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This is the only retrieval algorithm that is not coded in our implementation as 
shown here. In the implementation, it is iterative and uses a backtracking stack. 
The other recursive algorithms for term-to-term equality and matching are also 
recursive in our current implementation, in the form we have given them above. 

4.4 Updates 

Updates are significantly more complex in context trees than in the standard 
substitution trees. 

For insertion, one starts with a linearized term, together with several MF- 
sets for keeping the information about the equivalence classes of the variables. 
For instance, the term f{x,y,x,y,y) is inserted as f{xi,X2,X3,X4,X5) with the 
associated information that xi and X3 are in the same class, and that X2, X4, 
and X3 are in the same class. Hence if this term is inserted in an empty tree, we 
obtain a tree with one node containing the substitution: 

Xo = f(xi,X2, X3, X4, X3), Xi = X3, X2 = X4, X2 = X3 

(or with other, equivalent but always non-redundant, equality constraints). 

The insertion process in a non-empty tree first searches for the node where 
insertion will take place. This search process is like matching, except for two 
aspects. Firstly, the external variables of the index are only allowed to be in- 
stantiated with variables of the inserted term. But since a variable x in the tree 
sometimes plays the role of an internal and external variable at the same time 
(see Example 11 ), one cannot know in advance which situation applies until a 
leaf is reached: if x has no occurrence as the left hand side of a substitution (not 
an equality constraint) along the path to the leaf, then it plays the role of an 
external variable for this leaf. Secondly, during insertion the equality constraints 
are checked on the associated information about the variables classes, instead of 
checking the syntactic equality of subterms. 

If a siblings list is reached where no sibling has a total agreement with the 
inserted term then two different situations can occur. If there is a sibling with 
a partial agreement with the inserted term, then one takes the first such sibling 
(first-fit, this is what we do) or the sibling with the maximal (in some sense) 
agreement (best-fit). The substitution of this node is replaced with the common 
part (including the common equality constraints) , and two new nodes are created 
with the remaining substitutions. If a point is reached where all sibling nodes 
have an empty common part with the inserted substitution, then the inserted 
substitution is added to the siblings list. In both situations, the remaining sub- 
stitution of the inserted term is built including the equality relations that have 
not been covered by the equality constraints encountered along the path from 
the root. 

Deletion is also tricky, mainly because finding the term to be deleted requires 
again to control the equality constraints and the instantiation of external vari- 
ables only with variables of the term to be found. Moreover, unlike what happens 
in insertion, backtracking is needed for finding. 

It is important to be aware of the fact that updates cost little time in practice, 
because updates are relatively infrequent compared with retrieval operations. 
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Experiments seem to confirm that there are only one or two updates per thou- 
sand retrievals. On the benchmarks of [5] (see below) in none of the benchmarks 
updates took more than 5 percent of the time. 

5 Experiments 

In our experiments we have adopted the methodology described in [5]. (In that 
paper one can find a detailed discussion of how to design experiments for the 
evaluation of indexing techniques so that they can be repeated and validated 
by others without difficulty.) For the purposes of the present paper, we ran 
30 very large benchmarks, each containing the exact sequence of (thousands 
of update and millions of retrieval) operations on the matching index as they 
are executed when running one of three well-known state-of-the-art provers on 
certain problems drawn from various subsets of the TPTP problem date base 
[8]. Comparisons are made between our preliminary implementation (column 
“Cont.” in Figure 5 below) with the discrimination tree implementation (column 
“Disc.”) of the Waldmeister prover [3], and the code trees (column “Code”) of 
the Vampire prover [9], as provided and run by their own implementors. 

The figure 5 shows that, in spite of the fact that our implementation can 
be much further improved (see Section 6), it is already quite competitive in 
time. Moreover, context trees are, as expected, best in space, except for the very 
small indexes (mostly coming from the Waldmeister prover) . (A substantial fur- 
ther space improvement can be expected from a compiled implementation as 
sketched in section 6.2.) Code trees are, conceptually, a refined form of standard 
discrimination trees. In their latest version [7], code trees apply a similar treat- 
ment of the equality tests as the one of [6] we use here. The faster speed of code 
trees is, in our opinion, by and large due to the compilation of the index (see 
Section 6). We do not include here the results of the aforementioned Spass and 
Dedam implementations of substitution trees, because, with a similar degree of 
refinement of coding as our current implementation of context trees, they are at 
least a factor three slower and need much more space. 

6 Conclusions and Future Work 

The concept of context trees has been introduced and we have shown (and ex- 
perimentally verified) that large space savings can be possible compared with 
substitution trees and discrimination trees. We have described in detail how 
these trees can be efficiently implemented. By representing terms in Curry form, 
an implementation can be based on a simplified variant of substitution trees. 
Already from the performance of our first (unfinished) implementation it can be 
seen that context trees have a great potential for applications in automated theo- 
rem proving. Due to the high degree of sharing, they allow for efficient matching, 
they require much less memory, and yet the time needed for the somewhat more 
complex updates remains negligible. 
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Fig. 2. Experimental results 



With respect to our implementation, more work remains to be done regarding 
a tighter coding of the four (two of them recursive) algorithms used for retrieval 
— those we have seen in this paper. Experience with other implementations of 
term indexes has shown that this can give substantial factors of speedup. 

Apart from these low-level aspects we also believe that there are at least two 
other directions for further work from which further substantial improvements 
will be obtained. We are going to describe them briefly now. 



6.1 Exact Computation of Backtracking Nodes 

Far more information than we have discussed so far can be precomputed at 
update time on the index. We describe one of the more promising ideas that 
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should help to considerably reduce the amount of nodes visited at retrieval time 
and to eliminate the need of a backtracking stack. 

Consider an occurrence p of a substitution pair Xj = t in a substitution of the 
tree. Denote by accum{p) the term that is, roughly speaking, the accumulated 
substitution from the root to the pair p, including p itself. If, during matching, 
a failure occurs just after the pair p, then the query term is an instance of 
accum{p) (and this is the most general statement one can make at that point 
for all possible queries). This knowledge can be exploited to exactly determine 
the node to which one should backtrack. Let p' be first pair after p in preorder 
traversal of the tree whose associated term accum{p') is unifiable with accum{p). 
Then accum{p') is the “next” term in the tree that can have a common instance 
with accum{p). Therefore, accum{p') is precisely the next term in the tree of 
which the query can be an instance as well! Hence the backtracking node to 
which one should jump in this situation is the one just after p' . 

It seems possible to recompute locally, upon each update of the tree, the back- 
tracking pointers associated to each substitution pair, and store these pointers 
at the pair itself, thus actually minimizing (in the strictest sense of the word) the 
search during matching. We are currently working out this idea in more detail. 

6.2 Compiled Context Trees 

One of the conclusions that can be drawn from the experiments of [5] is that 
it does in fact pay off to compile an index into a form of interpreted ab- 
stract instructions as suggested by the code trees method of [9] (similar find- 
ings have also been obtained in the field of logic programming). For context 
trees, consider for example again the SubstMatch loop we saw before. Instead 
of such a loop, one can simply use a linked list of abstract code instructions like 
TermEqual(adressl,addres2,FailureAddress) where FailureAddress is the 
address to jump to in case of failure. The main advantage of this approach is that 
no control (like the outermost if statement of SubstMatch) has to be looked 
up, and, since the correct address arguments are already part of the abstract 
code, no instructions like subst = subst->next are needed and many indirect 
accesses like subst->lhs can be avoided. 

In addition, one can use instructions decomposing operations like TermMatch 
into sequences of instructions for the concrete second argument, which is known 
at compile (i.e., index update) time. Assume we specialized the TermMatch func- 
tion for matching with the index term (f{gx)), that is, 

10 ref — > 20 f 

21 ref — > 30 g 

31 var 

This would give code such as the following sequence of 7 one-argument instruc- 
tions: 



goto fail; 



if (! HeapIsRef (query) ) 
query = HeapAddr (query) ; 
if (HeapTag(query) ! =’f ’ ) 



goto fail; 
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if ( IHeapIsRef (query+1) ) goto fail; 
query = HeapAddr (query+1) ; 
if (HeapTag(query) ! =’g’ ) goto fail; 

HeapSet Addr (31 , query+1) 

By simple instruction counting, this code is easily shown to be far more efficient 
on an average query term than the general-purpose two-argument TermMatch 
function with (f{gx)) as second argument. All these advantages give more 
speedup than what has to be paid for in overhead arising from the need for 
interpreting the operation code of the abstract instructions. The latter is just a 
switch statement that, in all modern compilers, produces constant time code. 
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1 Introduction 

The problem of term indexing can be formulated abstractly as follows (see [19]). 
Given a set L of indexed terms, a binary relation R over terms (called the retrieval 
condition) and a term t (called the query term), identify the subset M of L that 
consists of the terms I such that R{l,t) holds. Terms in M will be called the 
candidate terms. Typical retrieval conditions used in first-order theorem proving 
are matching, generalization, unifiability, and syntactic equality. Such a retrieval 
of candidate terms in theorem proving is interleaved with insertion of terms to 
L, and deletion of them from L. 

In order to support rapid retrieval of candidate terms, we need to process 
the indexed set into a data structure called the index. Indexing data structures 
are well-known to be crucial for the efficiency of the current state-of-the-art 
theorem provers. Term indexing is also used in logic and functional program- 
ming languages implementation, but indexing in theorem provers has several 
distinctive features: 

1. Indexes in theorem provers frequently store 10^-10® complex terms, unlike 
a typically small number of shallow terms in functional and logic programs. 

2. In logic or functional language implementation the index is usually con- 
structed during compilation. On the contrary, indexes in theorem proving 
are highly dynamic, since terms are frequently inserted in and deleted from 
indexes. Index maintenance operations start with an index for an initial set 
of terms L, and incrementally construct an index for another set L' that is 
obtained by insertion or deletion of terms to or from L. 

3. In many applications it is desirable for several retrieval operations to work 
on the same index structure in order to share maintenance overhead and 
memory consumption. 

* Partially supported by the Spanish CICYT project HEMOSS ref. TIC98-0949-C02- 
01 . 
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Therefore, along the last two decades a significant number of results on new 
indexing techniques for theorem proving have been published and successfully 
applied in different provers [19,6,8,18,24,16,11,2,4,5,26,22,21]. In spite of this, 
important improvements of the existing indexing techniques are still possible 
and needed, and other techniques for previously not considered retrieval oper- 
ations need to be developed [14]. But implementors of provers need to know 
which indexing technique is likely to behave best for his/her applications, and 
developers of new indexing techniques need to be able to compare techniques 
in order to get intuition about where to search for improvements, and in order 
to provide scientific evidence of the superiority of new techniques over other 
previous ones^^. 

Unfortunately, practice has revealed that an asymptotic worst-case or 
average-case complexity analysis of indexing techniques is not a very realistic 
enterprise. Even if such analysis was done, it would be hardly useful in practice. 
For example, very efficient linear- or almost linear-time algorithms exist for unifi- 
cation [17,10] but in practice these algorithms proved to be inefficient for typical 
applications, and quadratic- or even exponential-time unification algorithms are 
used instead in the modern provers. Thus, theoretically worse (w.r.t. asymptotic 
complexity analysis) algorithms in this area frequently behave better in prac- 
tice than other optimal ones. For many techniques one can design bad examples 
whose (worst-case) computational complexity is as bad as for completely naive 
methods. An average-case analysis is also very difficult to realize, among other 
reasons because in most applications no realistic predictions can be made about 
the distribution of the input data. 

Similar phenomena take place when randomly generated data are used for 
benchmarking. For example, in propositional satisfiability attempts to find hard 
problems resulted in discovering random distributions of clauses which guarantee 
the existence of a phase transition [1,13]. Experimentally it has been discovered 
that problems resulting from the random clause generation in the phase tran- 
sition region are hard for all provers, but the provers best for these problems 
proved to be not very efficient for more structured problems coming from prac- 
tical applications [7]. 

^ In fact, in the recent past two of the authors recommended rejection for CADE of 
each other’s papers on improvements of code tree and substitution tree indexing due 
to the lack of evidence for a better performance. 

^ Due to the lack of evidence of superiority of some indexing techniques over other 
ones, system implementors have to take decisions about implementing a particular 
indexing technique based on criteria not directly relevant to the efficiency of the 
technique. As an example, we cite [23]: 

Following the extremely impressive results of Waldmeister, we have chosen 
a perfect discrimination tree ... as the core data structure for our indexing 
algorithms. 

Here the overall results of Waldmeister were considered enough to conclude that it 
implements the best indexing techniques. 
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Hence the only reasonable evaluation method is to apply a statistical anal- 
ysis of the empirical behaviour of the different techniques on benchmarks cor- 
responding to real runs of real systems on real problems. But one has to be 
careful because one should not draw too many conclusions about the efficiency 
of different techniques based on a comparison between different concrete imple- 
mentations of them; many times these implementations are rather incomparable 
due to different degrees of optimization and refinement^. 

Our main contribution here is the design of the first method for comparing 
different implementations based on a virtually unlimited supply of large real- 
world benchmarks for indexing. The basic requirements to such benchmarks are 
as follows. First, since different provers may impose different requirements on the 
indexing data structures, a general means should be given for obtaining realistic 
benchmarks for any given prover. Second, it should be possible to easily create 
benchmarks by running this prover on any problem, and do this for a significant 
number of different problems from different areas. Third, these benchmarks have 
to reproduce real-life sequences of operations on the index, where updates (dele- 
tions and insertions of terms) are interleaved with (in general far more frequent) 
retrieval operations. 

The method we use for creating such benchmarks for a given prover is to add 
instructions making the prover write to a log file a trace each time an operation 
on the index takes place, and then run it on the given problem. For example, 
each time a term t is inserted (deleted, unified with), a trace like -\-t (resp. 
—t, ut) is written to the file. Moreover, we require to store the traces along 
with information about the result of the operation (e.g., success/failure), which 
allows one to detect cases of incorrect behaviour of the indexing methods being 
tested. Ideally, there should be enough disk space to store all traces (possibly in 
a compressed form) of the whole run of the prover on the given problem (if the 
prover terminates on the problem; otherwise it should run at least for enough 
time to make the benchmark representative for a usual application of the prover) . 

The main part of the evaluation process is to test a given implementation 
of indexing on such a benchmark file. This given implementation is assumed to 
provide operations for querying and updating the indexing data structure, as 
well as a translation function for creating terms in its required format from the 
benchmark format. In order to avoid overheads and inexact time measurements 
due to translations and reading terms from the disk, the evaluation process first 
reads a large block of traces, storing them in main memory. After that, all terms 
read are translated into the required format. Then time measuring is switched 
on, and a loop is started which calls the corresponding sequence of operations, 
and time is turned off before reading the next block of traces from disk, and so 
on. 



® Unfortunately, in the literature one frequently encounters papers where a very tightly 
coded implementation of the author’s new method is compared with other relatively 
naive implementations of the previously existing methods, sometimes even run on 
machines with different characteristics that are difficult to compare. 
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This article is structured as follows. Section 2 discusses some design decisions 
taken after numerous discussions of the authors. Section 3 gives benchmarks for 
term retrieval and index maintenance for the problem of retrieval of generaliza- 
tions (matching a query term by index terms), generated by running our provers 
Vampire [20], Fiesta [15] and Waldmeister [9] (three rather different, we believe 
quite representative, state-of-the-art provers) on a selection of carefully chosen 
problems from different domains of the TPTP library [25] . 

In Section 4 we describe the evaluation on these benchmarks of code trees, 
context trees, and discrimination trees as they are provided and integrated in 
the test package by their own implementors, and run under identical circum- 
stances. As far as we know, this is the first time that different indexing data 
structures for deduction are compared under circumstances which, we believe, 
guarantee that experiments are not biased in any direction. Moreover, the im- 
plementations of code trees and discrimination trees we consider are the ones of 
the Vampire and Waldmeister provers, which we believe to be among the fastest 
(if not the fastest) current implementations for each one of these techniques. 
Hence for these two implementations it is unlikely that there are any differences 
in quality of coding. Although context trees are a new concept (see [3]) and the 
implementation used in the experiments (the only one that exists) was finished 
only one week before, the implementation is not naive since it is based on earlier, 
quite refined implementations of substitution trees. 

All test programs, implementations and benchmarks mentioned in this paper 
are publicly available at http://www.lsi.upc.es/~roberto. 



2 Some Design Decisions 

In this section we discuss some decisions we had to take, since they may be 
helpful for the design of similar experiments for other term indexing techniques 
in the future. 

We decided to concentrate in this paper on the measurement of three main 
aspects of indexing techniques (but the same methodology is applicable to other 
algorithms and data structures, see Section 6 for examples). 

The first aspect we focus on is the time needed for queries where the retrieval 
condition is generalization: given a query term t, is there any indexed term I such 
that for some substitution a we have la = tl This retrieval condition has at least 
two uses in first-order theorem provers: forward subsumption for unit clauses and 
forward demodulation (simplification by rewriting with unit equalities) . It is also 
closely related (as an ingredient or prefilter) to general forward subsumption. It 
is well-known to be the main bottleneck in many provers (especially, but not 
only, in provers with built-in equality). Unlike some other retrieval conditions, 
search for generalizations is used in both general theorem provers and theorem 
provers for unit equalities. Some provers do not even index the other operations. 

Though time is considered to be the main factor for system comparison, 
memory consumption is also crucial, especially for long-running tests. During 
the last years processor speed has grown much faster than memory capacity and 
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it is foreseen that this trend will continue in the coming years. Moreover, mem- 
ory access speed is also becoming an important bottleneck. Excessive memory 
consumption leads to more cache faults, which become the dominant factor for 
time, instead of processor speed. In experiments described in [27], run on a rel- 
atively slow computer with 1Gbyte of RAM memory, it was observed that some 
provers consume SOOMbytes of memory in the first 30 minutes. The best modern 
computers are already about 10 times faster than the computer used in these 
experiments, which means that memory problems arise very quickly, a problem 
that will become more serious in the coming years. For all these reasons, memory 
consumption is the second object of measurement in this paper. 

The third and last aspect we focus on here is frequency of updates (insertions 
and deletions) and the time needed for them. In this field there exists an amount 
of folk knowledge (but which, unfortunately, differs among researchers), about 
the following questions. How frequent are updates in real applications? Is it 
really true that the time needed for updates is negligible? Is it worth or feasible 
to restructure larger parts of the index at update time (i.e., more than what is 
currently done)? 

Perfect filtering or not? Our definition of term indexing corresponds to perfect 
filtering. In some cases implementation of imperfect filtering, when a subset or a 
superset of candidate terms is retrieved, does not affect soundness and complete- 
ness of a prover. In particular, neither soundness nor completeness are affected if 
only a subset of candidate terms is retrieved, when the retrieved generalizations 
are only used for subsumption or forward demodulation. 

It was decided that only perfect filtering techniques should be compared. 
Of course, this includes implementations based on imperfect filters retrieving 
a superset of candidate terms, combined with a final term-to-term correctness 
test. Indeed, comparing imperfect indexing techniques alone does not make much 
sense in our context, since the clear winner in terms of time would be the system 
reporting “substitution not found” without even considering the query term. 

Should the computed substitution he constructed in an explicit form? When 
search for generalizations is used for forward subsumption, computing the substi- 
tution is unnecessary. When it is used for forward demodulation, the computed 
substitution is later used for rewriting the query term. We decided that explicit 
representation of the computed substitution is unnecessary since all indexing 
techniques build such a substitution in an implicit form, and this implicit rep- 
resentation of the substitution is enough to perform rewriting. 

One or all generalizations? In term indexing, one can search for one, some, or 
all indexed candidate terms. When indexing is used for forward subsumption or 
forward demodulation, computing all candidates is unnecessary. In the case of 
subsumption, if any candidate term is found, the query term is subsumed and can 
be discarded. In the case of forward demodulation, the first substitution found 
will be used for rewriting the query term, and there is no need either for finding 
more of them. Hence we decided to only search for one candidate. Note that for 
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other retrieval conditions computing all candidate terms can be more appropri- 
ate. Examples are unification, used for inference computation, and (backward) 
matching, used for backward subsumption and backward demodulation. 

How should problems be selected? It was decided for this particular experiment 
that every participant contributes with an equal number of benchmarks, and 
that the problems should be selected by each participant individually. There was 
a common understanding that the problems should be as diverse as possible, 
and that benchmarks should be large enough. Among the three participants. 
Vampire is the only prover that can run on nonunit problems, so to achieve 
diversity benchmarks generated by Vampire were taken from nonunit problems 
only. Nonunit problems tend to have larger signatures than the unit ones. Quite 
unexpectedly, benchmarks generated by Fiesta and Waldmeister happened to 
be quite diverse too, since Fiesta generates large terms much more often than 
Waldmeister. 

But even if one collects such a diverse set of benchmarks, one cannot expect 
that all practical situations are covered by (statistically speaking) sufficiently 
many of them. For example, if one wants to check how a particular indexing 
technique behaves on large signatures, our benchmarks suite would be inappro- 
priate since it contains only two benchmarks in which signatures are relatively 
large. Our selection of 30 problems is also inadequate if one wants to check how 
a particular technique behaves when the index contains over 10® terms, since 
the greatest number of terms in our indices is considerably less. 

But it is one of the advantages of the proposed methodology that a potentially 
unlimited number of new benchmarks with any given properties can easily be 
generated, provided that these properties correspond to those occurring in real 
problems. 

Input file format. Input file format was important for a reason not having direct 
relation to the experiments. The benchmark files, written in any format, are 
huge. Two input formats have been proposed: one uses structure sharing and 
refers to previously generated terms by their numbers; the other one uses a 
stringterm representation of query terms. In the beginning it was believed that 
the first format would give more compact files, but in practice this happened 
to be not the case since a great majority of query terms proved to be small. 
In addition, files with structure sharing did not compress well, so a stringterm 
representation was chosen, see Figure 1. Finally, stringterms are easy to read 
for humans which helped us a lot when one indexing data structure produced 
strange results because an input file was mistakenly truncated. 

But even compressed files with benchmarks occupy hundreds of megabytes, 
which means that difficulties can arise to only store them in a publicly acces- 
sible domain and transfer them over the Web. Fortunately, this problem was 
easy to solve, since these files have a relatively low Kolmogorov complexity, an 
observation that, this time, can also be used in practice: they have been pro- 
duced by three much smaller generators (provers) from very short inputs, so for 
reproducing the files it is enough to store the generators and their input files. 
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We also reconsidered our initial ideas about how to store the terms inter- 
nally before calling the indexing operations. We experimented with complicated 
shared internal representations, in order to keep the memory consumption of 
the test program small (compared with the memory used by the indexing data 
structures) , thus avoiding noise in the experiments due to cache faults caused by 
our test driver, since the driver itself might heavily occupy the cache. But finally 
we found that it is better for minimising cache faults to simply read relatively 
small blocks (of 2MB) of input from disk at a time, and store terms without 
any sharing, but contiguously in the same order as they will be considered in 
the calls to the indexing data structure. Of course the number of terms (i.e., 
of operations on the index) read from disk at a time should not be too small, 
because otherwise timing is turned on and off too often, which also produces 
noise. 



This is an extract from a benchmark file generated by Waldmeister from the TPTP 
problem LCL109-2. Comments have been added by the anthors. 



each benchmark file starts with the 
signature symbols with respective arities 



a/2 # 

b/0 # 

c/1 # 

?abO # 

?b # 

+abO # 

!ab5 # 



query term a(b,xO), 
query term b 
insert term a(b,xO) 

query term a(b,x5). 



"?" signals failure 
to the index 
" ! " signals success 



-accbb 



# delete term a(c(c(b)),b) from index 



Fig. 1. An example benchmark file 



Is the query term creation time included in the results? Despite the simplicity 
of this question, it was not easy to answer. Initially, it was thought that time for 
doing this should be negligible and essentially equal for the different indexing 
implementations. However, in actual provers the query terms are already avail- 
able in an appropriate format as a result of some previous operation, so there 
is no need to measure the time spent on copying the query term. In addition, 
some, but not all, participants use the flatterm representation for query terms 
and it was believed that creating flatterm query terms from similarly structured 
stringterms could be several times faster than creating terms in the tree form. 
Hence it was decided that the time for creating query terms should not be in- 
cluded in the results. 

However, this decision immediately created another problem. An expensive 
single operation for matching the query term against a particular indexed term 
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is the comparison of two subterms of the query term. For example, to check if 
the term f{x,x) is a generalization of a query term f(s,t), one has to check 
whether s and t are the same term. Such a check can be done in time linear 
in the size of s, t. All other operations used in term-to-term matching can be 
performed in constant time (for example, comparison of two function symbols 
or checking if the query term is a variable). Now suppose that everyone can 
create any representation of the query term at the expense of system time. Then 
one can create a perfectly shared representation in which equal subterms will be 
represented by the same pointer, and checking for subterm equality becomes a 
simple constant time pointer comparison. Clearly, in practice one has to pay for 
creating a perfectly shared representation, so doing this in system time would be 
hardly appropriate^. The solution we have agreed upon is this: the representation 
of the query term should not allow for a constant-time subterm comparison, 
and for each participating system the part of the code which transforms the 
stringterm into the query term should be clear and easy to localize (e.g., included 
in the test driver itself). 

3 Generation of Benchmarks 

We took 30 problems from TPTP (10 by each participant), which created 30 
benchmarks. The table of problems and simple quantitative characteristics of 
the resulting benchmarks is given in Table 1. The first two columns contain the 
name of the problem and the system that generated the problem. The third 
column contains the number of symbols in the signature of the problem, for 
example 3-1-4 means that the signature contains 3 nonconstant function symbols 
plus 4 constants. In the following four columns we indicate the total number of 
operations {Total in the table), insertions in and deletions from the index {Ins 
and Del in the table), and the maximal size of the index in number of terms 
{Max in the table) during the experiment. In the last four columns we show the 
average size and depth of the indexed and query terms, respectively. Here by 
size we mean the number of symbols in the term, and the depth is measured so 
that the depth of a constant is 0. 

4 Evaluation 

We ran each indexing data structure on each benchmark, i.e., we did 90 experi- 
ments. The results are given in Table 2. We measured time spent and memory 
used by each system. In parentheses we put the time measured for index mainte- 
nance only (i.e. insertions and deletions). This time was measured in a second run 
of the 90 experiments, on the same benchmarks, but with all retrieval requests 
removed. 

Note that the (forward) matching operation is applied to formulae just after they 
have been generated, with the purpose of eliminating or simplifying them. Hence it 
is unlikely that they are in perfectly shared form, even in systems in which retained 
formulae are kept in a perfectly shared representation. 
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Table 1. Benchmark characteristics 



problem and 
generator 


sig 


operations 
Total Ins 


Del 


Max 


indexed terms 
Size Depth 


query terms 
Size Depth 


BOO015-4 


wal 


3-k4 


275038 


228 


228 


179 


13.8 


5.8 


5.1 


2.2 


CATOOl-4 


vam 


3-k4 


2203880 


18271 


3023 


16938 


26.0 


9.9 


7.1 


2.9 


CAT002-3 


vam 


4+4 


2209777 


12934 


4181 


11191 


29.2 


10.4 


7.1 


3.1 


CAT003-4 


vam 


3-k4 


2151408 


18159 


4476 


16606 


28.3 


10.6 


7.1 


2.9 


CIV003-1 


vam 


6-kl4 


3095080 


70324 


22757 


47567 


14.1 


5.1 


4.3 


1.6 


COL002-5 


fie 


2+7 


940127 


13399 


5329 


8353 


25.4 


9.3 


7.5 


2.8 


COL004-3 


fie 


2+5 


1176507 


765 


28 


737 


18.1 


6.7 


9.3 


3.2 


COL079-2 


vam 


3+2 


2143156 


14236 


4619 


9633 


38.8 


11.8 


7.3 


2.7 


GRP024-5 


wal 


3-k4 


2686810 


506 


506 


296 


16.3 


6.6 


7.7 


2.9 


GRP 164-1 


fie 


5-k4 


11069073 


53934 


3871 


50063 


16.4 


6.1 


5.9 


2.4 


GRP 179-2 


fie 


5-k2 


10770018 


52825 


2955 


49870 


16.8 


6.1 


6.1 


2.4 


GRP187-1 


wal 


4+3 


9999990 


2714 


1327 


1387 


12.0 


5.3 


5.0 


2.0 


GRP 196-1 


fie 


2+2 


18977144 


3 


0 


3 


26.3 


7.0 


10.8 


4.8 


HENOll-2 


vam 


1-klO 


4313282 


4408 


439 


3969 


10.7 


4.1 


2.7 


0.8 


LAT002-1 


vam 


2+6 


2646466 


26095 


1789 


24306 


17.5 


6.2 


5.6 


2.0 


LAT009-1 


wal 


2+3 


2514005 


596 


596 


291 


19.2 


7.4 


7.8 


2.8 


LAT020-1 


wal 


2+3 


9999992 


910 


493 


417 


14.8 


5.6 


9.3 


3.2 


LAT023-1 


fie 


3-k4 


822539 


4088 


2065 


2499 


18.4 


6.6 


6.0 


2.3 


LAT026-1 


fie 


3-k4 


772413 


6162 


3509 


4770 


20.8 


7.1 


5.6 


2.0 


LGL109-2 


fie 


3-k4 


312992 


4465 


519 


3947 


16.7 


6.1 


6.0 


2.6 


LGL109-2 


wal 


2-kl 


463493 


196 


196 


165 


19.2 


7.9 


5.7 


2.3 


LGL109-4 


vam 


4+3 


1944335 


40949 


3135 


37817 


20.0 


5.7 


8.1 


2.7 


RNG020-6 


fie 


6-k6 


2107343 


4872 


960 


3912 


17.4 


5.4 


6.3 


2.4 


RNG028-5 


wal 


5-k4 


3221510 


304 


304 


218 


31.4 


9.0 


14.0 


3.8 


RNG034-1 


vam 


4+4 


2465088 


15068 


4589 


11685 


26.4 


7.0 


6.1 


2.4 


RNG035-7 


wal 


3-k5 


5108975 


482 


482 


360 


21.7 


9.0 


13.9 


4.7 


ROB006-2 


wal 


3-k4 


9999990 


1182 


34 


1148 


19.1 


7.7 


12.4 


4.9 


ROB022-1 


fie 


3-k4 


922806 


2166 


826 


1341 


21.4 


9.3 


6.5 


2.8 


ROB026-1 


wal 


2+4 


9999991 


648 


15 


633 


16.9 


7.9 


12.6 


5.0 


SET015-4 


vam 


4+6 


3664777 


3256 


995 


2261 


16.3 


5.8 


3.6 


1.4 



Time. Vampire’s implementation of code trees is on the average 1.39 times than 
Waldmeister’s implementation of discrimination trees and 1.91 times faster than 
the current implementation of context trees^. These factors roughly hold for 
almost all problems, with a few exceptions, for example on problem CIV003-1 
Waldmeister is slightly faster than Vampire. The time spent for index mainte- 



® Note that in these average times problems with large absolute run-times predomi- 
nate. Furthermore, context trees are new (see [3]) and several important optimiza- 
tions for them have not yet been implemented. According to the first author, they 
are at least 3 times faster than the substitution trees previously used in Fiesta. We 
welcome anyone who has an efficient implementation of substitution trees (or any 
other indexing data strncture) to compare her or his technique with ours on the 
same benchmarks. 
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nance is in all cases negligible compared to the retrieval time, and there are 
essentially no fluctuations from the average figures: Waldmeister spends on the 
index maintenance 1.18 times less than Vampire and 1.49 times less than the 
context trees implementation. 

Memory. In memory consumption, differences are more important. On average, 
the implementation of context trees used 1 . 18 times less memory than Vampire’s 
implementation of code trees and 5.39 times less memory than Waldmeister’s 
implementation of discrimination trees. 



5 A Short Interpretation of the Resnlts 

Although the main aim of this work was to design a general-purpose technique 
for measuring and comparing the efficiency of indexing techniques in time and 
space requirements, in this section we very briefly and globally describe why we 
believe these concrete experiments have produced these concrete results. 

5.1 Waldmeister’s Discrimination Trees 

Waldmeister uses discrimination trees, which are like tries where terms are seen 
as strings and common prefixes are shared, in its so-called perfect variant. Nodes 
are arrays of pointers and can have different sizes: each node can have a child for 
each one of the function symbols and for each one of the variables that already 
occurred along the path in the tree, plus an additional child for a possible new 
variable. During matching one can index the array by the currently treated 
query symbol /, and directly jump to the child for /, if it exists, or to one of the 
variable children. Note that the case where children for both / and some variable 
exist (or for more than one variable), is the only situation where backtracking 
points are created. Usually, queries are represented as the so-called flatterms of 
[2], which are linked lists with additional pointers to jump over subterms t when 
a variable of the index gets instantiated by t. 

The results of our experiments for Waldmeister’s discrimination trees are not 
unexpected. The implementation is very tightly coded, and in spite of the lower 
amount of sharing than in other techniques, the retrieval speed is very high 
because the low-level operations (essentially, symbol comparison and variable 
instantiation) are very simple. It is clear that, especially for larger signatures 
and deep terms, the kind of nodes used leads to high memory consumption. 
Memory consumption is even higher because the backtracking stack needed for 
retrieval is inscribed into the tree nodes, enlarging them even more. This speeds 
up retrieval at the cost of space. 

5.2 Context Trees 

As we have mentioned, in discrimination trees common prefixes are shared. In 
substitution trees [6], terms keep their tree structure and all common contexts 
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Table 2. Resnlts. In this table, we list the benchmarks, named after the TPTP prob- 
lem they come from, along with the prover run on the given problem. Vam stands 
for Vampire’s code tree implementation, Wal for Waldmeister’s discrimination trees, 
and Con for the context trees implementation. Between parentheses, times withont 
retrievals, i.e., only insertions and deletions. 





time (in seconds) 


memory (in Kbytes) 


problem 


from 


Vam 


Wal 




Con 


Vam 


Wal 


Con 


BOO015-4 


wal 


0.25 (0.00) 


0.31 (0.01) 


0.46 


(0.01) 


11 


575 


11 


CATOOl-4 


vam 


3.28 (0.34) 


5.74 (0.31) 


7.11 


(0.41) 


3859 


13786 


3109 


CAT002-3 


vam 


2.90 (0.22) 


5.51 (0.31) 


6.67 


(0.33) 


2483 


9281 


2021 


CAT003-4 


vam 


3.21 (0.34) 


5.82 (0.34) 


6.90 


(0.46) 


3826 


13595 


3086 


CIV003-1 


vam 


7.57 (0.93) 


7.13 (0.65) 


15.57 


(1.16) 


3754 


22664 


3081 


COL002-5 


fie 


1.30 (0.17) 


1.55 (0.21) 


2.61 


(0.28) 


925 


6090 


922 


COL004-3 


fie 


0.96 (0.00) 


1.22 (0.00) 


2.39 


(0.02) 


80 


727 


86 


COL079-2 


vam 


5.46 (0.29) 


8.41 (0.26) 


7.24 


(0.43) 


2769 


9158 


2138 


GRP024-5 


wal 


3.54 (0.01) 


4.82 (0.00) 


7.44 


(0.01) 


19 


591 


22 


GRP 164-1 


fie 


17.60 (0.72) 


24.60 (0.61) 


32.06 


(0.89) 


5823 


28682 


5352 


GRP179-2 


fie 


18.34 (0.71) 


24.25 (0.60) 


32.40 


(0.87) 


5597 


29181 


5207 


GRP 187-1 


wal 


10.44 (0.02) 


11.68 (0.03) 


17.64 


(0.02) 


96 


903 


97 


GRP 196-1 


fie 


6.96 (0.00) 


11.92 (0.00) 


15.45 


(0.00) 


1 


543 


1 


HENOll-2 


vam 


3.36 (0.03) 


3.39 (0.02) 


5.18 


(0.04) 


221 


2069 


211 


LAT002-1 


vam 


5.83 (0.32) 


7.72 (0.29) 


9.48 


(0.44) 


3164 


14603 


2554 


LAT009-1 


wal 


3.78 (0.01) 


4.97 (0.01) 


5.97 


(0.01) 


19 


591 


20 


LAT020-1 


wal 


17.73 (0.01) 


24.97 (0.01) 


29.87 


(0.01) 


30 


631 


31 


LAT023-1 


fie 


1.10 (0.04) 


1.49 (0.03) 


1.92 


(0.07) 


198 


1646 


210 


LAT026-1 


fie 


1.11 (0.09) 


1.49 (0.07) 


1.79 


(0.12) 


373 


2813 


371 


LCL109-2 


fie 


0.47 (0.04) 


0.65 (0.06) 


0.80 


(0.05) 


508 


2285 


466 


LCL109-2 


wal 


0.49 (0.00) 


0.66 (0.00) 


0.82 


(0.00) 


16 


591 


15 


LCL109-4 


vam 


5.62 (0.70) 


7.65 (0.46) 


13.02 


(0.72) 


6703 


24403 


4986 


RNG020-6 


fie 


2.25 (0.07) 


3.19 (0.05) 


5.33 


(0.08) 


544 


2435 


517 


RNG028-5 


wal 


4.19 (0.01) 


6.66 (0.01) 


9.08 


(0.01) 


28 


607 


29 


RNG034-1 


vam 


3.27 (0.33) 


4.95 (0.21) 


6.86 


(0.34) 


2545 


8330 


2125 


RNG035-7 


wal 


8.19 (0.01) 


12.10 (0.01) 


18.55 


(0.01) 


36 


647 


37 


ROB006-2 


wal 


9.88 (0.01) 


14.31 (0.02) 


21.60 


(0.02) 


128 


1142 


116 


ROB022-1 


fie 


0.92 (0.03) 


1.20 (0.03) 


2.23 


(0.03) 


119 


1086 


101 


ROB026-1 


wal 


8.52 (0.01) 


13.35 (0.01) 


17.34 


(0.01) 


69 


807 


68 


SET015-4 


vam 


2.54 (0.02) 


2.69 (0.02) 


4.53 


(0.05) 


314 


1373 


258 




total 


total 




161.09 (5.48) 


224.39 (4.64) 


308.31 (6.90) 


44258 


201835 


37248 
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can be shared (note that this includes the comon prefixes of the terms seen as 
strings) . Context trees are a new indexing data structure, where, by means of a 
limited kind of context variables, also common subterms can be shared, even if 
they occur below different function symbols (see [3] for all details). More sharing 
allows one to avoid repeated work (this is of course the key to all indexing 
techniques). 

The basic idea is the following. Assume one has three terms h{x,f{t)), 
h{x,g{t)), and h{b,f(t)) in the index, and let the query be h{b,f{s)) where the 
terms s and t are large and s is not an instance of t. In a discrimination tree (and 
in a substitution tree) t will occur three times, and two repeated attempts will 
be made for matching s against t. In a context tree, the root contains h{xi, F(t)) 
(where F can be instantiated by a single function symbol of any arity) and an 
immediate failure will occur without exploring any further nodes. 

These experiments were run on a first implementation (which does not yet 
include several important enhancements, see [3]), based on curried terms and on 
an extension of substitution trees with equality constraints and where one does 
not distinguish between internal and external variables. Due to the high amount 
of sharing, context trees need little space, while still being suitable for compiling 
(see below). 

5.3 Vampire’s Code Trees 

On one hand. Vampire’s code trees can be seen as a form of compiled dis- 
crimination trees. One of the important conclusions that can be drawn from 
the experiments is that it pays off to compile an index into a form of inter- 
preted abstract instructions (similar findings have also been obtained in the 
field of logic programming) . Consider for instance a typical algorithm for term- 
to-term matching TermMatch (query, set). It is clear that the concrete term set 
is known at compile (i.e. index update) time, and that a specialized algorithm 
TermMatchWithSet (query) can be much more efficient. If the whole index tree is 
compiled, then also all control instructions become unnecessary (asking whether 
a child exists, or whether a sibling exists, or whether the node is leaf, etc.). These 
advantages give far more speedup than the overhead for interpreting the opera- 
tion code of the abstract instructions (which can be a constant time computed 
switch statement). 

But code trees are more than compiled discrimination trees. Let us first 
consider standard discrimination trees, where all variables are represented as a 
general variable symbol *, e.g., different terms like f{x,y) and f{x,x) are both 
seen as /(*, *), and the corresponding path in the tree is common to both. This 
increases the amount of sharing, and also the retrieval speed, because the low- 
level operations are simpler. But it is only a prefilter: once a possible match has 
been found, additional equality tests have to be performed between the query 
subterms by which the variables of terms like f(x,x) have been instantiated. 
One important optimization is to perform the equality tests not always in the 
leaves, but as high up as possible in the tree without decreasing the amount 
of sharing. This is how it is done in partially adaptive code trees [21]. Doing 
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the equality tests in the leaves would frequently lead to repeated work during 
retrieval time. Also, according to the first-fail principle (which is often beneficial 
in indexing techniques), it is important to impose strong restrictions like the 
equality of two whole subterms as soon as possible. Due to the fact that it is 
conceptually easier to move instructions than to restructure a discrimination 
tree, code trees are very adequate for this purpose. Finally, code is also compact 
and hence space-efficient, because one complex instruction can encode (matching 
with) a relatively specific term structure. 

6 Conclusions and Related Work 

In Graf’s book on term indexing [6] , the benchmarks are a small number of sets 
of terms (each set having between 500 and 10000 terms), coming from runs with 
Otter [12]. In other experiments also randomly generated terms are used. In all 
of Graf’s experiments first the index is built from one set and then retrieval oper- 
ations are done using as queries all terms of another set over the same signature. 
As said, the drawback of such an analysis is that it is unclear how representa- 
tive the sets of Otter terms are and how frequent updates are in relation with 
retrieval operations. In addition, in real provers updates are interleaved with 
queries, which makes it difficult to construct optimal indexes, especially in the 
case of highly adaptive structures such as substitution trees or context trees. 
Furthermore, in such experiments it is usually unclear whether the quality of 
coding of the different techniques is comparable. Although the quality of code in 
our provers can also be questioned, at least the authors of the discrimination tree 
and code tree implementations believe that their code is close to optimal. Note 
that, unlike all previously published papers, the systems participating in the 
experiments are competitors, and benchmarks were generated by each of them 
independently, so it is unlikely that the results are biased toward a particular 
technique. 

We expect more researchers to use our benchmarks and programs, which are 
available at http://www.lsi.upc.es/~roberto, and report on the behaviour 
of new indexing techniques. A table of results will be maintained as well at that 
web site. 

Other algorithms and data structures for indexing could be compared using 
our framework; let us mention but a few: 

1. retrieval of instances (used in backward subsumption by unit clause); 

2. retrieval of instances on the level of subterms (used in backward demodula- 
tion); 

3. retrieval of unifiable terms; 

4. forward subsumption on multiliteral clauses; 

5. backward subsumption on multiliteral clauses. 

Other interesting algorithms are related to the use of orderings: 

1. comparison of terms or literals in the lexicographic path order or the Knuth- 
Bendix order. 
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2. retrieval of generalizations together with checking that some ordering con- 
ditions are satisfied. 

We suggest anyone interested to contact the authors on the design of bench- 
mark suites for these problems within the framework of this paper. 



Acknowledgments. We thank Jurgen Avenhaus for valuable remarks to yes- 
terday’s version of this paper which was as a result substantially rewritten today. 



References 

1. P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the really hard problems 
are. In R. Reiter J. Mylopoulos, editor, IJCAI 1991. Proceedings of the 12th 
International Joint Conference on Artificial Intelligence, pages 331-340, Sydney, 
Australia, 1991. Morgan Kaufmann. 

2. J. Christian. Flatterms, discrimination nets, and fast term rewriting. Journal of 
Automated Reasoning, 10(1):95-113, February 1993. 

3. H. Ganzinger, R. Nieuwenhuis, and P. Nivela. Context trees. In IJCAR 2001, Pro- 
eeedings of the International Joint Conference on Automated Reasoning, Lecture 
Notes in Artificial Intelligence, Siena, Italy, June 2001. Springer Verlag. 

4. P. Graf. Extended path-indexing. In A. Bundy, editor, CADE-12. 12th Interna- 
tional Conferenee on Automated Deduction, volume 814 of Lecture Notes in Arti- 
ficial Intelligence, pages 514-528, Nancy, France, June/July 1994. 

5. P. Graf. Substitution tree indexing. In J. Hsiang, editor, Procs. 6th International 
Conference on Rewriting Teehniques and Applieations (RTA-95), volume 914 of 
Lecture Notes in Computer Science, pages 117-131, Kaiserslautern, 1995. 

6. P. Graf. Term Indexing, volume 1053 of Leeture Notes in Computer Science. 
Springer Verlag, 1996. 

7. J. Gu, P.W. Purdom, J. Franco, and B.W. Wah. Algorithms for the Satisfiability 
Problems. Cambridge University Press, 2001. 

8. C. Hewitt. Description and theoretical analysis of Planner: a language for proving 
theorems and manipulating models in a robot. PhD thesis. Department of Mathe- 
matics, MIT, Cambridge, Mass., January 1971. 

9. T. Hillenbrand, A. Buch, R. Vogt, and B. Lochner. Waldmeister: High-performance 
equational deduction. Journal of Automated Reasoning, 18(2):265-270, 1997. 

10. A. Martelli and U. Montanari. An efficient unification algorithm. ACM Transac- 
tions on Programming Languages and Systems, 4(2):258-282, 1982. 

11. W.W. McCune. Experiments with discrimination-tree indexing and path indexing 
for term retrieval. Journal of Automated Reasoning, 9(2):147-167, 1992. 

12. W.W. McCune. OTTER 3.0 reference manual and guide. Technical Report ANL- 
94/6, Argonne National Laboratory, January 1994. 

13. D.G. Mitchell, B. Selman, and H.J. Levesque. Hard and easy distributions of SAT 
problems. In W.R. Swartout, editor, Procs. 10th National Conference on Artificial 
Intelligence, pages 459-465, San Jose, CA, January 1992. AAAI Press/MIT Press. 

14. R. Nieuwenhuis. Rewrite-based deduction and symbolic constraints. In 
H. Ganzinger, editor, CADE-16. 16th Int. Conf. on Automated Deduction, Lec- 
ture Notes in Artificial Intelligence, pages 302-313, Trento, Italy, July 1999. 

15. R. Nieuwenhuis, J.M. Rivero, and M.A. Vallejo. The Barcelona prover. Journal of 
Automated Reasoning, 18(2): 171-176, 1997. 




On the Evaluation of Indexing Techniques for Theorem Proving 271 



16. H.J. Ohlbach. Abstraction tree indexing for terms. In H.-J. Biirkert and W. Nutt, 
editors, Extended Abstracts of the Third International Workshop on Unification, 
pages 131-135. Universitat Kaiserslautern, 1989. SEKI-Report SR 89-17. 

17. M. Paterson and M. Wegman. Linear unification. Journal of Computer and System 
Sciences, 16:158-167, 1978. 

18. P.W. Pnrdom and C.A. Brown. Fast many-to-one matching algorithms. In J.- 
P. Jouannaud, editor. Rewriting Techniques and Applications, First International 
Conference, RTA-85, volume 202 of Lecture Notes in Computer Science, pages 
407-416, Dijon, France, 1985. Springer Verlag. 

19. I.V. Ramakrishnan, R. Sekar, and A. Voronkov. Term indexing. In A. Robinson 
and A. Voronkov, editors. Handbook of Automated Reasoning, pages 1-97. Elsevier 
Science and MIT Press, 2001. To appear. 

20. A. Riazanov and A. Voronkov. Vampire. In H. Ganzinger, editor, CADE-16. 16th 
International Conference on Automated Deduction, volume 1632 of Lecture Notes 
in Artificial Intelligence, pages 292-296, Trento, Italy, July 1999. 

21. A. Riazanov and A. Voronkov. Partially adaptive code trees. In M. Ojeda-Aciego, 
I.P. de Gnzman, G. Brewka, and L.M. Pereira, editors. Logics in Artificial Intelli- 
gence. European Workshop, JELIA 2000, volume 1919 of Lecture Notes in Artificial 
Intelligence, pages 209-223, Malaga, Spain, 2000. Springer Verlag. 

22. J.M.A. Rivero. Data Structures and Algorithms for Automated Deduction with 
Equality. Phd thesis, Universitat Politecnica de Catalunya, Barcelona, May 2000. 

23. S. Schulz. Learning Search Control Knowledge for Equational Deduction, volume 
230 of Dissertationen zur kiinstliche Intelligenz. Akademische Verlagsgesellschaft 
Aka GmmH, 2000. 

24. M. Stickel. The path indexing method for indexing terms. Technical Report 473, 
Artificial Intelligence Center, SRI International, Menlo Park, CA, October 1989. 

25. G. Sutcliffe and C. Snttner. The TPTP problem library — CNF release v. 1.2.1. 
Journal of Automated Reasoning, 21(2), 1998. 

26. A. Voronkov. The anatomy of Vampire: Implementing bottom-up procedures with 
code trees. Journal of Automated Reasoning, 15(2):237-265, 1995. 

27. A. Voronkov. CASC 16 Preprint CSPP-4, Department of Computer Science, 
University of Manchester, Febrnary 2000. 




Preferred Extensions of Argumentation 
Frameworks: Query, Answering, and 
Computation 



Sylvie Doutre and Jerome Mengin 

Institut de Recherche en Informatique de Toulouse 
Universite Paul Sabatier 

118 route de Narbonne - F-31062 Toulouse cedex 4 
{doutre ,mengin}@irit . fr 



Abstract. The preferred semantics for argumentation frameworks 
seems to capture well the intuition behind the stable semantics while 
avoiding several of its drawbacks. Although the stable semantics has 
been thoroughly studied, and several algorithms have been proposed for 
solving problems related to it, it seems that the algorithmic side of the 
preferred semantics has received less attention. In this paper, we propose 
algorithms, based on the enumeration of some subsets of a given set of 
arguments, for the following tasks: 1) deciding if a given argument is in 
a preferred extension of a given argumentation framework; 2) deciding 
if the argument is in all the preferred extensions of the framework; 3) 
generating the preferred extensions of the framework. 



1 Introduction 

Argumentation frameworks of [Dun95,BDKT97] abstract many logical systems 
that have been used to formalize common-sense reasoning or to give a meaning 
to logic programs. Argumentation frameworks provide a unifying tool for the 
study of several aspects of these systems, notably their semantics. 

Underlying these systems is some notion of deduction, often nonmonotonic. 
One important consequence of nonmonotonicity is that the set of consequences 
that can be deduced from such a system is usually inconsistent in the sense of 
classical logic. The semantics is then given in terms of extensions, where each 
extension corresponds to a possible interpretation of the world. All extensions 
of a theory are maximally consistent subsets of the set of nonmonotonic conse- 
quences of the theory, but various semantics accept more or less of these subsets 
as extensions. Probably the most widespread semantics is the stable semantics, 
which is also the most restrictive one. Its definition is very intuitive, but it gives 
problematic results for many theories; in particular, there are theories which 
have no stable extension. [Dun95] has proposed another semantics, the preferred 
semantics for argumentation frameworks, which seems to capture the intuition 
behind the stable semantics while avoiding several of its drawbacks. 

The stable semantics has been thoroughly studied, and several algorithms 
have been proposed for the computation of stable extensions of nonmonotonic 
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theories. Algorithms by e.g. [Rei87,Lev91,DMP97,Nie95] rely on some underly- 
ing monotonic deduction system that provides information about the possible 
conflicts between arguments: they then compute extensions so as to avoid these 
conflicts within each single extension. It has been shown by e.g. [Ino92] that 
there is a strong connection between the computation of the possible conflicts 
and consequence finding. These conflicts can be captured in a directed graph, 
and [CL91,DT93,DM94] have proved that stable extensions of an argumentation 
framework correspond to particular subsets of this graph called kernels in graph 
theory [Ber73]. This connection between graphs and nonmonotonic theories is 
the basis for an algorithm in [DMP97] that computes stable extensions of a 
default theory. 

It seems that the algorithmic side of the preferred semantics has received less 
attention. The worst-case analysis of [DNT99,DNT00] show that the preferred 
semantics is at least as hard to compute as the stable semantics, and often harder 
(depending on what are accepted as consequences of argumentation frameworks: 
the union of the extensions or their intersection). One question related to the 
preferred semantics has been studied by a number of authors: given an argu- 
mentation framework and an argument, is there a preferred extension of the 
framework that contains the argument? Vreeswijk and Prakken [VPOO] propose 
a proof theory to answer that question, but stop short of actually giving an 
algorithm that would implement their proof theory. [DKT96] propose a proof 
procedure to answer the same question but in the restricted case where the 
argumentation framework corresponds to a logic program. [DM00] propose an 
algorithm that computes all the preferred extensions of an argumentation frame- 
work. This algorithm is based on a technique called set-enumeration in [Rym92], 
which can be used to generate all the subsets of a given set. This technique has 
been applied to many problems studied in Artificial Intelligence, like proposi- 
tional deduction or the computation of stable extensions of some nonmonotonic 
logics. In this paper, we propose algorithms, also based on the enumeration of 
the subsets of a given set of arguments, for the following tasks: 1) deciding if 
a given argument is in an extension of a given argumentation framework; 2) 
deciding if the argument is in all the extensions of the framework; 3) generating 
the extensions of the framework. In particular, the third algorithm improves on 
that presented in [DM00] . 

The paper is built as follows: the next section presents the preferred semantics 
of argumentation frameworks, and a characterization in terms of graph-theoretic 
concepts. Set-enumeration is described in section 3, where we also exhibit prop- 
erties of preferred extensions that can be used to reduce the number of generated 
sets of arguments. The algorithms are described in section 4. Section 5 concludes 
the paper with a comparison of our algorithms with related works and a discus- 
sion of possibilities for further enhancements. Proofs of the results can be found 
in [DM01]. 
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2 The Preferred Semantics for Argumentation 
Frameworks 

In this section, we present Dung’s argumentation framework [Dun95] and the 
preferred semantics. 

Definition 1. [Dun95] An argumentation framework is a pair (A,R) where A 
is a set of arguments and i? is a binary relation over arguments, i.e. R Q A x A. 
Given two arguments a and b, (a,b) € R or a R b means a attacks b (a is said 
to be an attacker of b) . Moreover, we say that a set S' C A of arguments attacks 
an argument a if some argument 6 in S attacks a. 

Thus an argumentation framework can be simply represented as a directed graph, 
where vertices are the arguments and edges correspond to the elements of R. 
Given an argument a G A, we denote by i?+(a) = {6 G A | (a,b) £ R} the set of 
the successors of a, by R~{a) = {& G A | (&, a) G i?} the set of its predecessors, 
and by R^{a) the set R~^ (a) U R~ (a) . Moreover, given a set S C A of arguments 
and e G {+, -, ±}, R%S) = UaesR"{a)- 

Example 1. Let AF\ = (A, R) with A = {a,b,c,d,e, f,g,h,j,k} and R as in- 
dicated on the graph below. We use this example throughout this section to 
illustrate our definitions and propositions, and then as a running example for 
our algorithms. 




f ► g ^ k 



Definition 2. Let (A, R) be an argumentation framework. An argument a G A 
is defended by a set S' C A of arguments (or S defends a) if and only if V6 G A, 
lib R a then S attacks b, i.e. 3c G S such that c i? 6. A set S C A is conflict-free 
if and only if there are no arguments a and 6 in S such that a attacks b. A set 
S C A is admissible if and only if S is conflict-free and Vx G S, S defends x. 

Dung [Dun95] defines the preferred semantics of an argumentation framework 
by the set of preferred extensions. We recall below Dung’s definition, and give a 
characterization of preferred extensions in terms of graph-theoretic concepts: 

Definition 3. Given an argumentation framework (A, i?), a set S C A is a 
preferred extension if and only if: 1) S is conflict-free; 2) S defends every argu- 
ment it contains; 3) S is C-maximal such that 1 and 2. The set of the preferred 
extensions of (A,R) is denoted by Pref(A,i?). 

Proposition 1. Given an argumentation framework (A, R), a subset S of A is a 
preferred extension if and only if the following conditions hold: 1) R~^{S)nS = 0; 
2) R~{S) C i?+(S'); 3) for every non-empty A C A — S', A fl i?+(S U A) yf 
0 or R-{X) % S+(SUA). 
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Example 2. S\ = {f,g}, S '2 = {f,j} and S 3 = {/, h} defend h against the attack 
of g. S 2 and S 3 are conflict-free, not Si. Then 5*1 cannot be an admissible set, 
nor is S 2 which cannot defend its argument j against the attack of h. S 3 is able 
to defend every argument it contains, thus S 3 is admissible. AFi possesses two 
preferred extensions: {b, d, /, h} and {a, /, h}. 

Dung [Dun95] exhibits interesting properties of the preferred semantics: in par- 
ticular, every admissible set is contained in a preferred extension, every argumen- 
tation framework possesses at least one preferred extension. Moreover, a finite 
argumentation framework without cycle has exactly one preferred extension. 
Working with frameworks containing cycles is difficult, since they lead generally 
to multiple extensions. These difficulties will be illustrated with our example 
AFi. 

We want to answer in this paper two important questions on preferred ex- 
tensions: given an argument and an argumentation framework (A,R), is the 
argument in at least one preferred extension of {A, R)1 Is it in every preferred 
extension? Or equivalently, is the argument a credulous or a sceptical conse- 
quence of {A,R)1 We define formally these notions: 

Definition 4. Given an argumentation framework (A, R) and an argument a G 
A, a is a credulous consequence of {A, R) if and only if a is contained in the 
union of the preferred extensions of (A, R); a is a sceptical consequence of (A, R) 
if and only if a is contained in the intersection of all the preferred extensions of 
(A,R). 

Example 3. b, a, d, f and h are credulous consequences of AFi, any other argu- 
ment of AFi is not. The only sceptical consequences of AFi are / and h. 

3 Extension Enumeration 

Before we describe our algorithms in the next section, we present here the general 
technique on which they are based, and formal properties that will be used to 
speed-up the computations. 

The enumeration of the subsets of a given set A can be performed by exploring 
a binary tree, the nodes of which are labeled by a partition of A into three sets 
I, O, and U. If U = 0, the node is a leaf, corresponding to the subset I of A. 
More generally, at any given node n, / is a set of elements of A that will be In 
every subset of A found in the subtree the root of which is n, and O is a set of 
elements of A that will be Out of every subset of A found in the same subtree, 
while U = A — {I U O); thus U is a set of elements that are Undecided at that 
stage, they can end up in some of the sets of the subtree rooted at the current 
node and out of some other sets in that subtree. If n is such that 17 0, then U 

has at most two children: one is labelled with the partition (/ U {x}, 0,U — {x}), 
the other is labelled with the partition (/, O U {x}, U — {x}) for some x £ U. 

Since we will be interested in the preferred extensions of some argumentation 
framework (A,R), only some subsets of A are of interest. Even in this case, it 
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would be possible to enumerate all subsets of A, and test for each of them if it 
verifies a given property. However, it can happen that it is sufficient to generate 
only one child, or no child at all, for some nodes. 

Before we investigate more closely the enumeration of preferred extensions, 
let us introduce notations that will lighten the presentation of our results. Pre- 
ferred extensions are conflict-free, so it is legitimate to generate only nodes la- 
belled by triples (I, O, U) such that R^{I) C O: since every preferred extension 
S found in the subtree rooted at that node must be conflict-free (cf. Def. 3) 
and verifies I C S', it cannot contain any element of R^{I); since S C I \J U, 
one way to ensure that S will be conflict-free is to explore only nodes such that 

C O. We call R-candidate a triple of disjoint sets {I,0,U) such that 
R^{I) C O. Given such an i?-candidate C = (/, U, O), and an element x of U, 
we denote by C+x the triple (/U{a;}, OUR^{x), U—{{x}UR^{x)), and by C—x 
the triple (/, O U {a;}, U — {a;}). Given a binary relation R and an i?-candidate 
C = (I, O, U), we denote by Pref*(C', R) the set of preferred extensions that are 
in the subtree rooted at a node labelled by (/, O, U): 

Pref*((/, O, U),R) = {S G Pref(/ U O U C/, i?) | / C S C / U C/}. 



Our first result shows that we can have a complete enumeration while ex- 
ploring only nodes that are i?-candidates: 

Proposition 2. Let Rbe a binary relation, let C = (/, O, U) be an R-candidate, 
and let x G U. If x ^ R^{x), then C -\- x and C — x are both R-candidates, and 
Pref*(C, R) = Pref*(C -I- a;, i?) U Pref*(C — x,R). If x G i?+(a;), then C — x is 
an R-candidate, and Pref*(G, R) = Pref*(C — x,R). 

Essentially, our algorithms will therefore select, at each stage, an undecided 
argument x and generate two new i?-candidates, by putting a; in / or in O (and 
adding predecessors and successors of a; to O in the first case) , thereby emptying 
U. Note that arguments that attack themselves are particular, since they can be 
in no preferred extension. In the sequel, we denote by Refl(H, R) the set of these 
arguments: 

Refl(H, R) = {x G A \ X G i?“'"(a;)}. 

When U is empty, we need to check if the leaf corresponds to a preferred ex- 
tension. Let (/, 0, 0) be the R-candidate that we And at the leaf, then / is 
conflict-free because R^{I) C O. We need to check that I defends itself, so that 
/ is admissible, and some maximality condition: it must be the case that for 
every subset X of O, / UX is not admissible. We define the following property: 

Max(/, O, G, R) = VX C 0,X = 0VXnR+(X) ^ 0VR-(X) % R+{IUUUX). 



Proposition 3. Let R be a binary relation, and let C = {I, O, U) be an R- 
candidate such that U = %. If R~{I) C R+(/), and if Max(J, O — R=*=(/),0, R) 
holds, then Pref*(G, R) = {/}, otherwise Pref*(C', R) = 0. 
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In the remainder of the section, we study properties that can be applied to prune 
the binary search. First we have strong properties that can be used to stop the 
search at a given stage, because they guarantee the existence or the absence of 
preferred extensions that verify certain conditions. These properties will be used 
in the query answering algorithms, that do not need to exhibit extensions, but 
only test if there is some or no extension that contains a particular argument. 

Our first condition guarantees that there can be no preferred extension in the 
subtree rooted at some node: since a preferred extension must defend itself, every 
element of / at a given stage must be defensible; so every predecessor of / must 
have predecessors that have not already been put out of the extensions currently 
explored; if this is not the case, then there is no hope of finding a preferred 
extension in the subtree whose root is the current node. Conversely, there is a 
condition that guarantees that there exists a preferred extension that contains 
a set I: if every element of / is defended by I, that is, if every predecessor of I 
is also a successor of I, then I is admissible, therefore contained in at least one 
preferred extension (which may not always be found in the subtree rooted at the 
current node, since too many elements may have already been put in O, thereby 
preventing / from growing until it becomes maximally admissible) . Formally, the 
following holds: 

Proposition 4. Let Rbe a binary relation, let C = (/, O, U) be an R-candidate. 
If there exists x G R~{I) — R'^{I) such that R~{x) C O, then Pref* (C,i?) = 0. 
Otherwise, if R ~ (/) — i?'*' (/) =0, then Pref {IDOUU, R) yf 0 and I is admissible. 

In particular, these conditions can be used to check if a given argument a G I 
is contained in at least one preferred extension. However, if we want to check if 
there is some extension that does not contain a given argument a G O, then we 
need a stronger property: 

Proposition 5. Let R be a binary relation, (7, O, U) be an R-candidate such 
that R~{I) — R'^{I) = %■ If a G R'^{I), or if a G O — i?^(7) and Max(7, O — 
R^(I),U, R) holds, then there exists S G Pref (7 U O U C/, 7?) such that a ^ S. 

Let us now turn to properties that will enable us to prune one half of the subtrees 
rooted at some nodes. That is, we exhibit conditions that guarantee that a given 
X G U is such that every extension S such that I Q S Q I U U contains x, or 
such that none of these extensions contains x. We define the following sets: 

App(7,C/,7?) = {xGU \ xi R^{I)DR+{U) and R-{x) C R+{lD{x})} 
Undef (7, [/, 7?) = {x G U \ R~ (x) g, R+{IUU)} 

Heroes(7, U, R) = {x G U \ 3z G 1 ,3y G R~{z),{x} = R~ (y) H (C/ U 7)} 
Traitors(7, 0,U, R) = {x G U \ 3y G R'^{x),3z G O C\ R'^{y), 

z i 7?+(7UC/U{z}) and R~(z)-yC i?+(7)} 

App(7, U, R) is the set of undecided elements that cannot be in conflict with 
any extension S found in the subtree rooted at the current node (since x ^ 
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R^{I U [/) 3 R^[S)), and who are already defended by I or defend themselves 
(R~{x) C i?+(/ U {a;})): these elements will be in every preferred extension in 
that subtree. Similarly, the elements of Undef(I, U, R) will be in no extension S 
in that subtree, since they cannot be defended anymore: not all their predecessors 
can be attacked (R~{x) 2 R^{I U C/) 3 i?+(S')). 

Elements of App(I, U, R) and of Undef(/, U, R) will be added to / and O re- 
spectively, and their fate will be sealed (at least in the current subtree). However, 
when no such rule can be applied, we may have to create two branches, in which 
case we will add an element z to / in one branch, to O in the other, without any 
certainty that it truly belongs to where it is put; in fact, the presence of that 
element in one set or the other may only become fully justified when the fate of 
other elements is decided. In particular, if z G / only has one potential defender 
X left, then x must be added to I. Heroes(7, U, R) is the set of such last defenders 
of some elements of I. Conversely, if z G O has no attacker in I, then it must 
not be defensible by /; so if z is already defended against all its attackers except 
one, whose attackers are not already in O, then these last potential defender of 
z must not be in I. Traitors(I, O, U, R) is the set of those undecided elements 
who may have gone to /, but go to O in order to support another element of O. 

Proposition 6. Let Rbe a binary relation, let C = (/, O, U) be an R-candidate, 
and let X G U such that x ^ R~^{x). Then: 

1. if X € App(J, U, R) U Heroes(I, U, R), then Pref*(C, R) = Pref*(C' -I- x, R); 

2. if X G Undef(/, U, R) U Traitors(7, O, U, R), then Pref*(C', R) = Pref*(C — 
X, R). 

Prop. 6 gives conditions under which only one child of a given node can be 
generated: this interesting feature relies on the possibility to select an undecided 
element that has some particular property. If no such undecided element can be 
found, then we may have to generate two branches. The question that arises is 
if this can be further avoided, since one can expect the enumeration in a subtree 
whose root has two children to be on average twice as hard as in a subtree whose 
root only has one child. Recall that an argumentation framework that has no 
cycle only has one preferred extension. So it is natural to check if the set of 
undecided elements at a given node is cycle-free: we can hope that in this case 
there can only be one extension in the corresponding subtree. It is indeed the 
case: 

Proposition 7. Let R be a finite binary relation, and let C = (I, O, U) be an 
R-candidate such that App{I,U, R) = Undef(/, C/, i?) = 0 and U is cycle-free. 
Then Prer(C,R) =Pref*((JU[/,O,0),i?). 

4 The Algorithms 

The general extension enumeration algorithm is quite straightforward. We de- 
scribe it by means of a recursive function PrefEnum, which, given a binary 
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relation R and an i?-candidate C = {I,0,U), returns true if Pref* (C,i?) ^ 0, 
and returns false otherwise. At each step, the function tests if the answer can 
be found at the current node: StopCondTrue(C', R) tests if the answer is true, 
StopCondFalse(C, R) tests if it is false. If there is no easy answer, then the func- 
tion generally calls itself on (C -I- x, R) and on {C — x,R), for a selected element 
X of U. This selection of the branching x is crucial, since it may be the case that 
for a good choice of x, only one branch needs to be explored (cf. Prop 6): for 
this reason, the selection function Select(C, R) returns a triple (x, 5/, bo), where 
bj and bo are two booleans, indicating respectively if the branch in which x is 
added to I, and the branch in which x is added to O, have to be explored. When 
U is empty or contains no more cycle (Prop 7), the selection function returns 
bi and bo false and it only remains to test if / U [/ is a preferred extension of 
(/ U [/ U O, R); this is the role of the condition FinalTest(/ U U, O, R). 

Function PrefEnum(i?, C) 

Param. a binary relation R, an R-candidate C = (I,0,U) 

Result T if an extension is found, _L otherwise 

1. if StopCondTrue(C', i?) then T ; 

2. else if StopCondFalse(C, i?) then _L; 

3. else 

a) (x, bi, bo) Select(C, i?); 

b) if ~^bj A -ifeo then FinalTest(/ U U, O, R) 

c) else (6/ A PrefEnum(C -I- x, R)) V {bo A PrefEnum(C' — x, R)); 

Step 3c of the algorithm leaves unspecified the order in which the branches are 
explored when two branches have to be explored. The choice of the order may 
have a big influence on the efficiency of the algorithm, since if a positive answer 
is found on the first branch that is explored, then the other one does not have 
to be explored. 

This skeleton of algorithm is used to answer the credulous and the sceptical 
query answering problems (that is, checking if an argument is a credulous or 
a sceptical consequence of an argumentation framework), and also to compute 
preferred extensions. Note that a self-attacking argument does not belong to 
any preferred extension: we directly answer false to a credulous or a sceptical 
query on such an argument. For the computation, and for all the other credulous 
and sceptical queries, we describe precisely the call of PrefEnum, the conditions 
StopCondTrue, StopCondFalse and FinalTest and the function Select. For this 
last function, we use a function Choose which, applied to a non-empty set (of 
arguments or of sets of arguments), returns an unspecified element of the set. 
We also use a function Cycles which, given a set X of arguments, returns the 
set of the sets of arguments which compose cycles in X. 

4.1 Credulous Query Answering 

Given an argumentation framework {A, R) and an element a of A, we want to 
check if a is in a preferred extension of {A, R). Since we want to find an extension 




280 S. Doutre and J. Mengin 



that contains a, we call the extension enumeration function PrefEnum on R and 
on the i?-candidate (0, 0, A) + a. The function returns T if an extension is found, 
_L otherwise. According to Prop. 4, the stop conditions are: 

StopCondTrue(C', R) = R~{I) — A+(/) = 0 (/ is an admissible set) 
StopCondFalse(C, i?) = 3a; G R~{I) — A+(/) such that R~{x) C O 
(/ cannot defend all its elements) 

Our strategy to answer this problem is to try to empty the set R~{I) — R'^(I), 
that we denote in the sequel by Op {Op is a subset of O). This set contains 
predecessors of / that are not (yet) successors of I. To empty it, we look at the 
predecessors of its elements and we try to make them go in /. This is possible 
only if the predecessors are not self-attacking (since I must be conflict-free): 
self-attacking predecessors are put in O. Otherwise, we check if undecided pre- 
decessors of Op belong to Heroes(/, C/, i?) or App{I,U,R), in which case they 
go in / (Prop 6). If none of these pruning properties is applicable, we select an 
undecided predecessor of Op. The Select function is the following: 

Function SelectCred(C', R) 

Parara. a binary relation R, an R-candidate C = {I,0,U) 

Result an argument and two booleans 

1. if Refl(C/ n i?” (Op)) yf 0 then (Choose(Refl(C/ O i?“(Op))), _L, T) 

2. else if Heroes(7, U,R) then (Choose(Heroes(/, U, R)),T, _L) 

3. else if App{I,U,R) O R~{Op) yf 0 then (Choose(App(7, {7, i?) O 
i?-(Op)),T,T) 

4. else (Choose (i? (Op)),T,T); 

As the Select function never returns bj and bo false, the function FinalTest will 
never be used. Let us now run the algorithm on two examples. We describe the 
call of PrefEnum and we show on the graph representation of the argumentation 
framework which arguments are in I (circled), in O (crossed) or in U at the end 
of the run. 

Example 4- Given the argumentation framework AFi, is there a preferred ex- 
tension containing dl We call PrefEnum on R and on (0, 0, A) + d. Since d is in 
I, c and e go in O, because they are respectively predecessor and successor of d. 
Argument c belongs to Op and it has only one undecided predecessor: b. Argu- 
ment b is in fact the only potential defender of d: it belongs to Heroes (/, U, R). 
So b goes in I and consequently its successor-predecessor a goes in O. Now there 
is no more argument in O without predecessor in I. StopCondTrue((7, 0, U), R) 
is true, we have found an admissible set : {d, b}, so d belongs to a preferred 
extension. 
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Given AF\, is there a preferred extension containing c? We call PrefEnum on 
R and on (0, 0, A) + c. Since c is in /, arguments b, d and e go in O. Argument 
e is in Op because it is predecessor but not successor of I. In fact, its only 
predecessor is in O: we are in the case where / cannot defend all its elements. 
StopCondFalse((/, O, U),R) is true, so there is no preferred extension containing 



a I 




o 



4.2 Sceptical Query Answering 

Given an argumentation framework (A, R) and an element a of A, we want to 
check if a is in every preferred extension of (A, R). To solve this problem, we look 
at its complementary: we look for an extension not containing a. To this end, we 
call the extension enumeration function PrefEnum on R and on the i?-candidate 
(0, ^,A) — a, and we try to build an extension. If PrefEnum(i?, (0, 0, A) — a) = T, 
then a is in every preferred extension, otherwise it is not. 

We keep the notation Op = R~{I) — and we introduce a new one: 

Ou = O — R^{I). The stop conditions are the following, according to Prop. 4: 

StopGondTrue(C', R) = Op = 9 A {a G R'^(I) V (a G A Max(/, Ou,U,R))) (a 
is attacked by an admissible set or a cannot be defended by any preferred 
extension containing I); 

StopGondFalse(C, i?) = (3x G Op, C O) V -iMax(I, 0„, 0, i?) (/ cannot 

defend all its elements or / will never be maximal). 

Our strategy to answer a credulous query is the following: we try to show that 
a is attacked by an admissible set, and if it is not possible, we try to show that 
it is not defensible. We give the Select function and then we comment it: 

Function SelectScept£,(0, i?) 

Param. a binary relation R, an R-candidate C = {I,0,U) 

Result an argument and two booleans 

1. if a G R^{I) then SelectGred(0, i?) 

2. else if Traitors (I, 0,U,R) then (Ghoose (Traitors (/, [/, i?)), T, T) 

3. else if Refl(C/ 0 i?“(0„), i?) 0 then (Ghoose(Refl([/ 0 i?“(0„), i?)), T, T) 

4. else if Undef (/,[/, i?) 0 R~{Ou) ^ 0 then (Ghoose(Undef(/, C/, i?) 0 
i?-(0„)),T,T) 

5. else if App(/, [/, i?) 0 R (0„) 0 then (Ghoose(App(I, [/, i?) 0 

i?-(0„)),T,T) 

6. else if Gycles(C/) = 0 then (Ghoose(C7), T, T) 

7. else if 3e G Gycles([/) such that e 0 i?“(0„) y^ 0 then (Ghoose(e 0 
i?-(0„)),T,T) 
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8. else if [/ n i? (0„) 0 then (Choose ([/ fl i? (0„)),T,T) 

9. else (Choose(Choose(Cycles(17))), T, T); 

When we call the function PrefEnum, a is in it is rejected, but it is neither 
successor nor predecessor of I. First of all, we try to show that a is not defensible. 
To this end, we use the pruning conditions Traitors, Refl and Undef (instructions 
2, 3 and 4). Traitors directly acts on predecessors, whereas we must choose 
in Refl and Undef arguments which are predecessors of Ou- If we cannot show 
that a is not defensible, then we try to make a a successor of I, thanks to the 
pruning condition App (instruction 5). If we succeed, then we try to build an 
admissible set just like we did for the credulous query answering (instruction 
1). Finally, if no pruning condition can be applied, we check if U is cycle-free 
(instruction 6). Then the function PrefEnum will do the following final test on 
the set / U U: 

FinalTest(I U U, O, R) = Max(I U U, 0„, 0, R) 

If U is not cycle-free, then we choose an undecided predecessor of 0„, possibly 
cutting a cycle (instructions 7 and 8). If is empty, we choose an undecided 
argument cutting a cycle. 

Example 5. Given the argumentation framework AF\, is argument d contained 
in every preferred extension? We call PrefEnum on R and on (0, 0, A) — d. Argu- 
ment d is in Ou- Its only predecessor, c, cannot be defended against its attacker 
e, because its only defender was d: c belongs to Undef (/,[/, i?) fl i?“(0„), so 
it goes in O, precisely in 0„. Argument e is a predecessor of c, and its only 
defender against d was c: e belongs to Undef (/, U, R) fl R~ (Ou) and goes in On- 
Now argument d has still a potential defender against c, the undecided argu- 
ment b- This argument must go in O, since we want to build an extension not 
containing d: b belongs to Traitors(7, U, R)- Op is empty and no argument of the 
set Ou is defensible. StopCondTrue((J, O, U),R) is true, there exists a preferred 
extension not containing d- 




f ► g ► h ^ j^, k 



Given AFi, is argument h contained in every preferred extension? We call 
PrefEnum on R and on (0,0, A) — h- Argument h has only one predecessor, g- 
We cannot apply any pruning property to make g go in / or O. So we build 
two branches, one where g goes in I (branch 1), the other one where it goes in 
O (branch 2). In branch 1, / and 5 go in O because they are predecessor of g- 
But now / belongs to Op, and it does not have any predecessor: g will never be 
defended against /. It means there is no extension in this branch. The graph 
representation at the end of the run on branch 1 is the following: 
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In branch 2, / and b belong to Traitors (/, C/, i?) because they are po- 
tential defenders of h against g. If we select /, then / goes in O. But 
{/} C Ou is conflict-free and R~{{f}) C since / has no predecessor: 

StopCondFalse((/, O, U), i?) is true. This branch cannot lead to a preferred ex- 
tension because argument / should be in / for / to be a maximal admissible set 
at the end of the run. The graph representation at the end of the run on branch 
2 is the following: 




Consequently, since PrefEnum returns false in each branch, h is in every 
preferred extension of AFi. 

4.3 Computation of the Preferred Extensions 

We call PrefEnum on R and on the i?-candidate (0,0, A). The stop conditions 
are the following: 

StopCondTrue(C, i?) = T; 

StopCondFalse(C, i?) = G Op, R~{x)CO (/ cannot defend all its ele- 
ments). 

StopCondTrue(0, R) is always false, because we can say we have found a pre- 
ferred extension only when the status of every argument is decided; this test is 
done by FinalTest(/ U U,0,R) at the end of the run. Our strategy is to apply 
at first the pruning conditions that make arguments go in / or O on a justified 
way (looking at the arguments of App{I,U, R) and Undef (/,[/, i?)). According 
to Prop 7, if these two conditions are not applicable, then we check if U is cycle- 
free. If U is not cycle-free, we look for Heroes and Traitors to justify the presence 
of arguments put in / or O at a branching point. Finally, if none of these pruning 
conditions are applicable, we choose an undecided argument that cuts a cycle. 

Function Select (C, R) 

Param. a binary relation R, an R-candidate C = (/, O, U) 

Result an argument and two booleans 

1. if Refl(f7, i?) 0 then (Choose(Refl([/, i?)), T, T) 

2. else if App(/, U,R)^9 then (Choose(App(I, U, R)),T, T) 

3. else if Undef(/, U,R) ^ 9 then (Choose(Undef(/, U, i?)), T, T) 
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4. else if Cycles([/) = 0 then (Choose(C7), _L, _L) 

5. else if Heroes(I, C/, i?) 0 then (Choose(Heroes(7, U, R)),T, _L) 

6. else if Traitors (/, 0,U,R) then (Choose (Traitors (/, [/, i?)), T, T) 

7. else (Choose(Choose(Cycles(17))), T, T); 

Since we want to enumerate all the extensions, FinalTest must always be T, 
otherwise the function PrefEnum could stop before having finished to enumerate 
all the extensions. FinalTest must be written as a function that, as a side effect, 
tests if / is a preferred extension, and if true, outputs I. To this end, we introduce 
the function Print. 

Function FinalTest 

Parara. a triple of disjoint sets (I,U,0), and a binary relation R 

Result T 

1. if Max(J U U, Ou, 0, R) then Print (/); 

2. T. 



Example 6. Let us compute all the preferred extensions of AFi. We call 
PrefEnum on R and on the i?-candidate (0,0, A). Argument k is self-attacking, 
so it goes in O. Argument / is not attacked: it belongs to App{I,U, R), so it 
goes in I. Since it is in I, its successor g goes in O. Argument h is attacked by 
g but it is defended by /; moreover, it is conflict-free with I, so it belongs to 
App(I, U, R) and then goes in I. Consequently, h’s successor (argument j) goes 
in O. Now, no more pruning property is applicable and U is not cycle-free. The 
graph representation of AFi is the following: 




Then the Select function returns an argument that cuts a cycle, for example 
argument d. We build two branches, one where d goes in I, the other one where 
d goes in O. When d goes in I, we have the same reasoning on arguments a, 
b, c and e as in example 4. The status of every argument being decided, a call 
to FinalTest shows that I = {b, d, /, h} is a preferred extension. The graph 
representation of AFi at the end of the run on the first branch is: 




When d goes in O, we have the same reasoning on arguments 6, c and e as 
on example 5. Argument a is not in conflict with lUU and defends itself against 
b: a belongs to App(/, U, R), so it goes in I. The status of every argument being 
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decided, a call to FinalTest shows that / = {a, /, h] is a preferred extension 
of AFi. The graph representation of AFi at the end of the run on the second 
branch is: 




5 Discussion 

Since preferred extensions have been put forward by [Dun95] as a remedy to 
features of stable extensions that are sometimes undesired, it is interesting to 
compare the algorithms presented in the previous sections to similar ones that 
have been designed for the computation of stable extensions. 

Recall that, by definition, a subset S of the set of vertices A of some argu- 
mentation framework {A, R) is a stable extension of {A, R) if and only if it is 
conflict-free and dominant, that is if every element of A which is not in S has a 
predecessor in S. In terms of graph-theoretic concepts, S' is a stable extension of 
{A, R) if and only if R^{S) fl S = 0 and A — S C i?+(S). A well-known result of 
[Dun95], is that every stable extension is also a preferred one. So all the proper- 
ties that have been used to prune the enumeration of subsets of A when looking 
for preferred extensions can also be used when computing the stable extensions. 
In particular, the algorithms of [Nie95,DMP97] also explore the equivalent of our 
i?-candidates. Although other properties that we have put forward in section 3 
could be used for the computation of stable extensions, these algorithms use 
other properties that are too strong to be used for the computation of preferred 
extensions. In particular, in the case of stable extensions, we can add an ele- 
ment a: to / as soon as all its predecessors are in O, but in the case of preferred 
extension we need to ensure that the predecessors of x have predecessors in /. 

Our algorithm can be used to compute the stable extensions of an argumen- 
tation framework, since these are preferred extensions as well: we only need to 
check for every leaf if O C every argument that is not in a stable ex- 

tension must have a predecessor in this extension. A detailed comparison of our 
algorithm with that of [DMP97] can be found in [DM01]. 

The idea of looking for an element of a cycle to branch at a stage when no 
pruning property can be applied (cf Prop. 7) was first put forward by [DMP97]. 
However, [DMP97] propose to compute a feedback vertex set first, that is, a 
set F of vertices of a graph (A, R) such that the graph restricted to A — F 
contains no cycle. They then run an enumeration- like algorithm, choosing an 
undecided element of F every time a branching point is reached. Notice that 
finding a minimal feedback vertex set of a given graph (A, R) is known to be 
an NP-complete problem, although heuristics can be used to find a small one. 
However, there are cases where elements of a feedback vertex set, even a minimal 
one, are of no use. This is why it may be more efficient to look for a cycle only 
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when necessary. Moreover, in order not to find several times the same cycle in 
different branches of the algorithm, we can maintain a set of all cycles that have 
been found so far during the execution of the algorithm. 

Example 1. Let A = {a,b,c,d} and let R = {(a, 6), (5, c), (c, d), (d, 6)}. Since a 
has no predecessor, a £ I; then, the successor of a, that is b, has to be in O; 
then d cannot be defended, so it has to be in O, whereas c has to be in / since 
it is defended by a and it is not adjacent to it. Although there is a cycle here 
(bRcRdRb) , the tree explored by the algorithm has a single branch. 

It is not difficult to see that, in the case of the preferred semantics, strongly con- 
nected components of an argumentation framework can be treated separetely 
(see e.g. [DM01]). This is not new, and is particularly interesting when generat- 
ing the preferred extensions: the first call to PrefEnum should then be split into 
as many calls as there are strongly connected components. When answering a 
sceptical query, only the component to which the query belongs should be ex- 
plored. In the case of the credulous query answering algorithm only the strongly 
connected component to which the query belongs will be explored, since in this 
case the Select function returns predecessors of the set Op, that is, predecessors 
of predecessors of the current set I. 

The worst-case analysis of [DNT99] shows that, under the preferred seman- 
tics, the sceptical reasoning problem is in a complexity class above that of the 
credulous reasoning problem in the polynomial hierarchy. The extra cost of scep- 
tical reasoning appears in our algorithm in the stop conditions and in the final 
test, which involve maximality tests. Such a test is expensive since it implies 
some enumeration of the parts of the set O of the arguments which are Out of 
the extension being built: the algorithm needs to check that none of these parts 
could be added to the admissible set I found so far while preserving the admis- 
sibility property. Notice that we can use our enumeration algorithm in order to 
perform this check: we would call it on the i?-candidate (/, 
and look for an admissible set different from I. 

[VPOO] and [CDMOl] propose dialectical proof theories to answer credulous 
queries on preferred semantics. These proof theories have the form of a dialogue 
between a proponent and an opponent. An important aspect of such proofs is 
that they are quite natural, in the sense that they give an easy way to understand 
the implications of the underlying notions of acceptability. The two dialectical 
proofs theories of [CDMOl] are directly inspired by the credulous query answering 
algorithm presented here. They improve on the one of [VPOO], since the proofs 
that they produce are generally shorter than the proofs of [VPOO]. 

[VPOO] also propose a dialectical proof theory to answer sceptical queries, 
but it works only in a particular case: when preferred extensions coincide with 
stable extensions. This is a very restrictive condition, because stable extensions 
do not always exist. This implies that the sceptical proof theory of [VPOO] is 
able to answer no sceptical query on the argumentation framework of example 1 
(this framework has no stable extension). 




Preferred Extensions of Argumentation Frameworks 287 



The good results of [CDMOl] suggest to design a dialectical proof theory to 
answer sceptical queries, inspired by the sceptical query answering algorithm 
presented here. This is the purpose of some future work. 



Acknowledgements. We would like to thank the referees for helpful comments 
on an earlier version of the paper. In particular, one of them suggested the call 
to PrefEnum in order to perform the maximality checks. 
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Abstract. We give an operational semantics for the logic programming 
language BLP, based on the hereditary Harrop fragment of the logic of 
bunched implications, BI. We introduce BI, explaining the account of 
the sharing of resources built into its semantics, and indicate how it 
may be used to give a logic programming language. We explain that the 
basic input/output model of operational semantics, used in linear logic 
programming, will not work for bunched logic. We show how to obtain 
a complete, goal-directed proof theory for hereditary Harrop BI and 
how to reformulate the operational model to account for the interaction 
between multiplicative and additive structure. We give a prototypical 
example of how the resulting programming language handles, in contrast 
with Prolog, sharing and non-sharing use of resources purely logically. 



1 Bunched Logic and Logic Programming 

The logic of bunched implications, BI, freely combines an additive (intuitionis- 
tic) and a multiplicative (linear) implication as connectives of equal status [12,14, 
15]. Thus it stands in stark contrast with linear logic [4], in which intuitionistic 
implication is available via an exponential [12,14,15]. 

The semantics of BI may be motivated directly by modelling the notion of 
resource. Consider the following, very simple, axiomatization of the notion of 
resource (clearly, refinements are possible): 



— An underlying set of resources, M; — A way of combining resources, • ; 

— A representative for zero resources, e; — A way of comparing resources, U. 

Mathematically, we recognize that we have naturally identified a (for now, com- 
mutative) preordered monoid Ai = (M, e, •, U) of resources. 

First, we may exploit the presence of the monoidal combining operation to de- 
fine the following multiplicative conjunction [17,12,14,15], in the possible-worlds 
style [9]: 

m \= (j) * ijj iff there are n and n' such that m Q n ■ n' , n \= (f) and n' \= 'll} 



R. Gore, A. Leitsch, and T. Nipkow (Eds.): IJCAR 2001, LNAI 2083, pp. 289-304, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




290 P.A. Armelm and D.J. Pym 

This conjunction is interpreted as follows: the resource required to establish 
is obtained by combining the resources required to establish each of (p and ip. 
Similarly, we can define the corresponding implication 

m\= p^ip iff for all n such that n\= p, m - n\= ip. 

This implication is interpreted as follows: if the resource required to establish the 
“function ” , p ^ip, is m and the resource required to establish the “argument” p, 
is n, then the resource required to establish the result is m-n. Thus the function 
and the argument do not share resources. Rather, their respective resources are 
taken from distinct worlds. 

Second, the presence of the preorder suggests the possibility of a satisfaction 
relation for the intuitionistic connectives, using worlds m,n £ M as usual: 

m\= p /\p iff m\= p and m\= p 

m\= p\/ p iff m\= p ov m\= p 

m \= p ^ p iff for all n C m, n 1= ^ implies n\= p. 

The conjunction (and the disjunction) are interpreted as follows: each of the 
conjuncts (disjuncts) may share resources with the other. The implication is in- 
terpreted as a “function” which may share resources with its argument. We refer 
to the meaning of the semantics described here as the sharing interpretation [12, 
15]. Similarly, the additives may be seen as describing loeal properties whereas 
the multiplicatives are global. We return to this point in § 4, in which we give a 
concrete, and implemented, programming example. 

Proof-theoretically, the presence of the two implications is, at first sight, 
problematic. To see this consider that whilst we may easily distinguish between 
multiplicative and additive elimination (or left) rules, 

r\-p^p A\-p ^ r\-p^p r\-p 

— A E — ~^E. 

r, A \- p r \- p 

how are we to distinguish the corresponding introduction rules ? We may write 



— 1-^- — ^ I but how then to write a rule for — >■ /, — -A- — ^ ? 

rpp^p ’ EPp^p 

A semantically clean solution is provided by moving to sequents built not out 
of finite sequences of hypotheses but rather out of bunches of hypotheses, i.e., 
finite trees, with the leaf nodes being formulae, and the internal nodes denoted 
by either “,” or and referred to as bunches. The grammar of bunches is given 
as follows: 



r ::= p propositional assumption 
I 0m multiplicative unit 



I r, r multiplicative combination 
I 0a additive unit 

I T; T additive combination 



We write E{A) to denote that Z\ is a sub-bunch of E in the evident sense. 
Equality of bunches, =, is given by the commutative monoid laws for “,” and 
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together with substitution congruence: if Z\ = Z\', then r{A) = r{A'). A 
bunch is said to be multiplicative if its top-level combinator is and additive 
if its top-level combinator is • Contraction and Weakening are permitted for 
but not for 



r{A-A)h^ 
r{A) h ^ 



Contraction 



r{A) h 4> 

r{A;A')h<P 



Weakening. 



In much of what follows, we regard and as multi-ary operators, and refer 
to their operands as components. 

The introduction and elimination rules for the multiplicative and additive 
implications now go as follows: 



r \- 

r \- A\- (p 

r,A^ip 






E 



E'r <p^%p 

r^tP^ip Ah 
T; A h 



Turning to predication, consider that we can express a first-order sequent over 
a collection X of first-order variables as {X)r h (p. Given this point of view, we 
can see that it is possible to allow not only F to be bunched but also X. So 
for each propositional rule, we have two possible forms of variable maintenance, 
i.e., additive and multiplicative. For example, the two choices for the predicate 
rule are 



{X)rh<p (y)Ah^ (x)rhp (y)Ah^ 

{X-Y)r,Ah <p*i; {X,Y)F,Ah <p*i> 



The former choice is the one taken in linear logic and in this paper. It may be 
simplified, via Weakening and Contraction, to 

(X)rh0 (A)Ahv> 

{x)r, A\-p*ip 



The latter is the one taken in the basic version of BI [12,14,15]. 

The presence of bunched variables has one very significant consequence: It 
permits the definition of both additive, or extensional, and multiplicative, or 
intensional, quantifiers. For example, 

{X;x)r^p {x,x)r^p 

{x)F\-yx.(p {X)r\-y„^x.4> ’ 

where x is not free in F, and 

{X)F\-yx.(P y hi: Term ^ (A)r h F h i : Term ^ 

(X; Y)F h (P[t/x] ^ {X, Y)F h cP[t/x\ 

Here, we assume a simple bunched calculus of term formation [12,14,15] and, as 
before, the additive case may be simplified to use just one bunch of variables. 
The corresponding existentials are similar. 
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Semantically, the additive quantifier is handled intuitionistically, 

m 1= iff for all n C m and all t defined at n, n |= 4>[t/x], 

i.e., the resource required to establish each instance of the quantified proposition 
must be available at the starting world. The semantics of the multiplicative 
explains our use of the term “new” : 

m ^ '^„^x.(j> iff for all n and all t defined a,t n, m ■ n \= (j)[t/x], 

i.e., the resource required to instantiate the proposition is taken from a new 
world or location. Again, the existentials are similar. 

Our notion of logic programming is that introduced in [11,10], based on the 
sequent calculus. We start with the fragment of the logic for which uniform proofs 
are complete for logical consequence. Reading proofs from the root upwards, 
uniform proofs require that right rules be applied whenever possible, so that left 
rules are applied only when the right-hand side is atomic. Uniform proofs are 
said to be simple just in case the implicational left rules are restricted to be 
essentially unary. For example, in first-order intuitionistic logic, we get 

r h (j>[t/x] a[t/x] h P[t/x] 

r,(f) D ah P ^ ’ 

with a, l3 atomic and a[t/x\ = (3\t/x\ (often, D a is retained in the left-hand 
premiss) . 

In intuitionistic logic, simple uniform proofs, which are goal- directed and in 
which the non-determinism is confined to the choice of implicational formula, are 
complete for hereditary Harrop sequents [11,10]. Simple uniform proofs amount 
to the analytic notion of resolution. Taking all this together, we interpret hered- 
itary Harrop sequents F ho 3x.(p as a logic program, F, together with a query, or 
goal, 4>, in which there is a logical variable x [8] . We use ho to denote the simple, 
uniform, i.e., resolution, proof, read from root to leaves. 

In BI, the corresponding class of sequents may be defined. Bunched hereditary 
Harrop formults are given by the following grammar, in which A denotes atoms 
(we simplify a bit, for brevity, omitting, for example, universal goals): 

Definite formulae D ::= A \ D A D \ G^A \ D*D \ G^A \ Mx.D \ y„„.^x.D 

Goal formulae G ::= T j A j G A G j D^G j G * G j D^G j G V G 
I 3a:.G j 3„^x.G 

A bunched hereditary Harrop sequent is a sequent V h G, where P is a bunch of 
definite formulae. Such sequents, for now without and 3^^^, are the basis of 
the bunched logic programming language, BLP, to which we give an operational 
semantics in § 3. 

In intuitionistic logic, each of the reduction operators used in the execution 
of goal-directed search is additive. In BI, however, as in linear logic [16,10,5], 
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we have multiplicatives which introduce a computationally significant difficulty. 
The typical case is *R\ 

A ho A \-Q (j)2 p _ p p 

Faced with F \~o 4>i * 4>2, the division of F into A and A must be calculated. 

The basic solution, described for linear logic in [7,6,16,5], is the so-called 
input/output model. First pass all of F to the left-hand branch of the proof, 
leaving the right-hand branch undetermined. Proceed with the left-hand branch 
until it is completed. Then calculate which of the formulae in F have been used 
to complete the left-hand branch and collect them into a finite set, The 

remaining, unused formulae may now be passed to the right-hand branch: 



Aeft bo 01 -T\Aeft 1"° 
r\~o 4>i® 4>2 

We refer to \ as a “remainder operator” because it removes from F the consumed 
formulae and passes the remainder to the next branch. 

In BI, the problem is made much more complex by the mixing of additive 
and multiplicative structure enforced by the presence of bunches and the basic 
input/output idea will not work. To see this, consider the following search for a 
proof of the (provable) sequent </>, V' 1“ ix^'>P^x)*4’ (note that it is convenient to 
put the remainder operator, read as “without”, in the “current” computation): 



^ = -^left> AAeft 



(not provable) 

(x; (0. b))\<i> 1-0 0Ax 



(0.b)\0 1-0 X->-(bAx) 0\0m Ho 0 

*R 

(0,y>)\0m Ho (x->-(bAx))*0 

Consider the left-hand branch of the candidate *R rule (the actual operational 
rules are defined formally in § 3) . In order to get an axiom of the form 'tp \- ijj, we 
must first remove the y from the program by performing a Weakening and then 
perform a subtraction of (j>, which is required on the right-hand branch of the 
*R, so that the result of y; (</>, 'tp)\4> is i.e., the remainder operator first throws 
away, via Weakening, the additive bunch surrounding the multiplicative bunch 
within which ip, the formula which must be removed, is contained. Now, the 
remaining bunch is sufficient to form, after an AR reduction, the axiom ip \- ip 
but insufficient to form the necessary axiom for y. 

At first sight, thinking of axioms of the form F] a h a, it might seem like 
we need a subtraction operation which does not perform Weakening. This would 
solve the problem in the particular case above but would worsen it in other cases. 
From being incomplete the system would become unsound. To see this, consider 
a search for a proof of the unprovable sequent r; {(p,ip) F (y— >■ (f/'Ar)) *((). Once 
the *R rule is applied, after doing — >• R, the propositions r and y are at the same 
level in (y; r; {(p, ip))\(p and, if this is taken to be equal to y; t ; ip, both y and t 
are provable. 
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Again it looks like this could be fixed, by requiring, for example, that the 
*R rule be applied only on multiplicative bunches. Aside from the unpleasant 
non-determinism introduced by this solution, it doesn’t quite solve the problem 
either. To see this, it is enough to consider a slight modification of the last search. 
Consider the unprovable sequent v, (r; b (r— (V'A t))*(^*u). Here, after 

the application of *R and — >■ i? it would be possible to prove both and r, which 
is unsound. It seems clear that the root of the difficulty lies with the interaction 
between the additives, in particular —>■ i?, and the multiplicatives. 

The basic idea for the solution, though far from simple in detail, is to in- 
troduce stacks which keep a record of which resources have been added to the 
program as a result of — >■ i? and which manage their interaction with the forma- 
tion of axioms, and with subtraction, and with passing by continuations. The 
detailed formulation of the continuation-passing style (CPS) operational seman- 
tics is rather complex and before giving it, in § 3, we must look, in § 2, at 
uniform, and simple, proofs in BI. 



2 Uniform Proofs in BI 



So far we have discussed BI semantically, and informally considered its use as a 
logic programming language. Formally, as we have indicated, the basis of logic 
programming in BI relies on the availability of goal-directed proofs. 

Proofs in BI may be presented as a sequent calculus, here given in Defini- 
tion 1. The Cut-elimination theorem holds [15] and semantic completeness the- 
orems are available [12,14,15,13]. In order to explain uniform proofs, and so also 
the subsequent operational semantics, we restrict our attention to propositional 
BI: although our logic programming language uses predicate BI, it exploits, in 
its core language, only additive predication and quantification, just as in Prolog 
[2,1], so that its use of logical variables and unification is completely standard. 
It follows that our presentation involves no significant loss of generality. 

The use of multiplicative predication and quantification is possible but more 
complex. We conjecture that its main use will be in BLP’s module system, adapted 
to BI from the basic ideas presented in [10], in which we conjecture the im- 
porting of functions from one module to another may exploit the additive- 
multiplicative distinction to valuable effect for the programmer. This topic is 
beyond our present scope. 



Definition 1. 

Identities 



The sequent calculus LBI is defined as follows: 



A.^om Cut 

A{r) h i/) 



Structurals 



r{A) h 4> 
r{A-A')hf 



r{A;A)hf 
r{A) h f 



r\- <j> 

A\- f 



w 



c 



{A = r) E 
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Units 

r(0™) P </> 



IL 



r(i) h 0 

Multiplicatives 



0m ^ I 



IR 



_L h . 



_LL 



rp<^ A{A',ij)hx 
A{A’,r,4>^'4>) h X 

mv>)Px 



— ^ L 



r{ 0 a) p <t> 
r(T) p 0 



r,^Pr/> 



*Z/ 



Additives 



p X 

rp<^ Z\(Z\';i/;)Px 
Zi(Zi';r;<^^r/>)PX 



U P 

r P Z\ P r/) 

r,A\- 

r P;^Pi/> 



TL 






n-^i;<^ 2 )py> 
r(0iA02) P 
r((/)) p X L\(i/)) p X 
i));A{4>\/ Ip) p X 



AL 
VL 



rp0^-i/) 

rp.?!) A\-ip 
F; A\- (p Alp 

r\- . 



0a PT 



TR 



r \- <p\/ 'ip 



VR 



^R 

R 

ar 

r\-ip 
r \- (pv Ip 



VR' 



The predicate substitution and quantifier rules may be given sequentially. □ 



Most of the permutations, necessary to show that uniform proofs are 
complete for bunched hereditary Harrop formulae are very straightforward. For 
example, 



Ai'tp) P X A{ip) P x' F\- (p A{%p) P X F^ (p A{ip) P x 

/\R L L 

F \- (p A{%p)\~xFx A{F, (p ^ tp) \- X A{F,(p^ip)\- X 

-♦ L AR 

P X Ax' A{F,(p^'ip)^ xFx! 

5 

or 



A{%p) P X a! V X FV <p A{tp) P X 

*R -* L 

F \- (p A{tp) , A' \- X * X A{F, p -* pj) \- X Af \- X 

— >1^ F ^R. 

A{F, p^'ip),A'\-x*X P^f>),A'\-x*X 

Note, however, that uniform proofs in BI must always perform any possible (and 
trivial) *Ls (and ALs) before performing any right rule: to see this, consider that 
we should like to have a uniform proof of p * ^p \- p * ip. 

Weakening is, at first sight, a source of difficulty. Weakenings may be per- 
muted above all rules except *R. To see this, consider that *R must divide a 
multiplicative bunch between its two premisses. So if a *R has a Weakening im- 
mediately below it, then there is no way in which *R may be applied directly to 
the resulting sequent. However, it turns out that this difficulty may be handled 
within the operational semantics by examining in turn each of the multiplicative 
bunches below the introduced by the Weakening. To make this work, we 
must consider a canonical form for bunches: a bunch is in canonical form iff 
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1. its left-hand branch is either a proposition, a unit or a canonical bunch of 
the opposite (additive or multiplicative) type, and 

2. its right-hand branch is canonical. 

For example, if F and F' are additive and canonical and if A is canonical, then 
(F,F',A) is canonical. Our operational semantics, in § 3, assumes that bunches 
are in canonical form. 

Lemma 1. Every bunch may he written in canonical form. □ 



Proposition 1. Uniform proofs in BI are complete for bunched hereditary Har- 
rop sequents. □ 

As we have seen, uniform proofs do not, however, characterize resolution. For 
that, we must ensure that the choice of clause in each implicational left rule is 
also goal-directed. For that we require proofs which are not merely uniform but 
simple, i.e., in which all instances of implicational left rules are of the following, 
essentially unary, forms (/ is the unit of *): 

F\-(h Ah I , Fh(h 

—^L -^L 

A, {F', a) h a F,(f)-*aha 

These rules are clearly admissible in LBI but to understand why the -^L rule 
is complete we must consider not only the canonical form for bunches but also 
that we may replace the basic axiom sequent with the following CutAxiom rule: 

^ ^ CutAxiom (CA). 

F, ah a 

The effect of this rule, which may be seen as a form of garbage collection, is to 
absorb, trivially, unused multiplicative bunches. Then we can make the following 
transformation of proof figures (0m is the unit of be., I on the left): 



Fz h 1 


Fi h f F 2 ,ah a 

— >1= L 

Fi, F 2 , 4 > -* a h a 



Fihf 

Unit of 

F2hl ri,0mb<^ 

Cut Ax 

Fi, F2 h (j> aha 

— L 

Fi, F 2 , 4> -* a h a 



in which the right-hand premiss of the L is rendered trivial. 

This step, together with an analysis of the permutations of L and — >■ L 
rules with respect to one another, is sufficient to give us the following: 

Proposition 2. Simple uniform proofs in BI are complete for bunched heredi- 
tary Harrop formulce. 
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3 An Operational Semantics for BLP 

As we have suggested, the operational semantics of BLP may be seen as a devel- 
opment of the input/output model, described in [7], to account for the bunched 
structure of sequents. However, this development represents a substantial gen- 
eralization of the method and involves a degree of technical complexity. 

Starting from uniform proofs, presented in § 2, we see that the remaining 
source of non-determinism is, essentially, the splitting of bunches in the *R 
operator. Thus a remainder can be valid only if it is calculated on the left- 
hand branch of a search above a *R operator. However, additional operational 
complexity arises from the bunched structure itself, i.e., from the interaction 
between the multiplicative and additive implications. (Notation: for clarity, we 
use just Rs, (j)S, etc., rather than the more cumbersome Ds and Gs notation. We 
distinguish atomic formulae as as. Nevertheless, we are working with hereditary 
Harrop sequents.) 

It follows that operational sequents have the form (T|s)()\"(T'|s') ho </>, which 
should be interpreted as follows: 

— T is the bunch which is passed up the current branch of the search tree; 

— r' is the bunch which is passed down the current branch of the search tree; 

— n and n' are counters which keep track of the number of T goals found. This 

is necessary since a T goal makes the logic locally affine. We use a counter 
instead of a simple flag so when applying the -^R rule to a goal if a 

T goal is found while proving ij) {i.e., \i n' > n) then any left-over of </> can 
be removed instead of failing; 

— s and s' denote stacks of bunches and are used to manage the interaction 
between the reduction operators used and the formulae available: 

- s manages upward propagation via open boxes and locked boxes] 

- s' manages downward propagation via full boxes, open boxes and locked 
boxes. 

Informally, full, open and locked boxes, which are used in the definitions of 
the *R, AR and — >■ R operators, arise as follows: 

Full box: {Z\,^} introduced downwards, arising from assignment to 

an unknown remainder — creating a full box cor- 
responds to an assignment in the CPS execution 

Open box: introduced upwards, arising from a search-figure in 

which a *R occurs below an — >■ i?. Here £ is the 
“theorem flag” , explained below 
Locked box: Kl introduced upwards, arising from a *R 

Their meanings will be made clearer in our description (below) of the oper- 
ational semantics; 

— Finally, \ is a remainder operator: informally, it is read as “without” or 
“leaves unused” . Formally, its meaning is defined by the operators for ho. 
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Overall, an operational sequent, (0|s),\”(0'|s') ho is read as “given a program 
r with a stack s as input, after proving (f> we are left with a program F' and 
stack s' for the subsequent computation” . An n > 0 indicates that a theorem or 
T was found as a goal, so that we are (locally) in affine BI. 

At each occurrence of a *R rule, a box is created and a flag, represented 
by e (for empty flag), is placed at the left-hand end of the bunch which is the 
remainder, is set to indicate whether remainders from the left-hand branch of 
the search are permitted (if e is present, then remainders are not permitted, if 
it is absent they are; it is rather like I). Initially, the box is locked, and is put 
at the top of the upward stack. 

When an — >■ i? operator is used, the box is unlocked and the antecedent of 
the implication (j) ^ ip is put into it: so we get a stack of the form [^!:s. If the box 
is unlocked already, with previous implicational antecedents in it, then the new 
antecedent is additively combined with the existing ones: so we get a stack of the 
form \A] (pf .:s. Also, the empty flag is reset, thereby allowing no remainder to pass 
except through the box. At this stage the open box carries the “theorem flag” 
£ which indicates that (p^ip is regarded as possibly an intuitionistic theorem, 
failure of which will be detected at a Cut Axiom rule. This procedure is necessary 
because theorems behave like T with respect to multiplicative resources. 

A box that contains a remainder is denoted by {A, (f>}::s. Such boxes occur 
in downward stacks and indicate that the computation was performed under the 
wrapping of </>, arising from additives, and leaves a remainder A. 

Finally, in order to give the operators of the operational semantics, we need 
to formally define a subtraction operator. We are now using \ to calculate re- 
mainders but we still need a basic way of removing a bunch from a super-bunch 
within which it lives. The subtraction of F from A is definable just in case F is 
a sub-bunch of A, written F Q A and is defined, exploiting the canonical form 

for bunches, as follows (here F means that F is additive and F means that F is 
multiplicative; we label the components of a bunch A as A^s): 



— li r = 0m, then F O A, any A; 

— r F A iS r = A or r F Ai^ some i; 

— F F A iff r F Ai^ some i; 

We then write A — F to denote the following subtraction operation: And the 
additive/multiplicative sub-bunch of A within which F lives and delete all of 
that sub-bunch. For example, (</>, {{ip, ip'); x)) — {'p’, 'p') = p- (Formally, Z\ — T is 
defined by recursion over additive/multiplicative cases.) 

The operators, read from conclusion to premisses, which describe the opera- 
tional semantics of BLP, are summarized in Table 1. The presentation relies on 
the execution of the left-hand branches before the right-hand. We describe the 



— F F A iff F F Ai^ some i; 

- F F A iff F = A or F F Ai, 
some i. 
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continuation-passing style execution of the operational semantics by considering 
the key cases of the operational reductions, given in Table 1, in turn. 

Cut Axiom: We are given a bunch T which contains the atom a, i.e., the 
principal formula of the axiom, in any position, and a stack s. There is exactly 
one way of performing the minimum number of Weakening reductions, reducing 
r{a) \- a to r\ a\- a, on r so as to bring a to top-level. This is done and we 
give to the success continuation the resulting bunch without a, i.e., the remainder 
r' . In fact, since the only possible reduction above a Cut Axiom is one of the 
Unit reductions, we can be more specfic about what the form of A and s' must 
be. For example, if A is S, F' is not equal to 0m and s has an open box on top, 
then s' must have a full box on top. Otherwise, A must be equal to F' and s' 
must be equal to s. If the top of the stack is an open box containing it is 
necessary to check whether a is in If not, then the theorem flag of the open 
box must be removed because it means that the 4>^ip in a previous —>■ i? which 
created the open box is not a theorem. 

We start with a bunch in which the clause occurs in an arbitrary 

position. The unary, or “resolution”, version of -*i=T is invoked but the bunch 
taken in the premiss is that which is obtained, as in the CutAxiom case, by 
performing the minimum number of Weakening reductions on F so as to bring 
a to top-level. 

*R: We start with F and s given and try to prove 4> from F and Kl::s. Upon 
success, we get a remainder, F' . At this point, we can try to prove i!) from F' 
and s. Upon success, we get as a result a remainder A together with an arbitrary 
modification of the stack. Because we managed to prove from F, leaving A, 
this result is given to the final success continuation. The special case in which A 
is £ works in the same way but for the fact that F' and s, which must be [], will 
prove if) only when F' proves ip without leaving any remainder. So, F and s are 
given, F' and A are calculated, and s is modified as necessary, depending on the 
reductions encountered above. An example which includes this case follows after 
the description of the operators. If a T goal or an intuitionistic theorem was 
found, the counter will be greater than zero and the computation will succeed 
even if e is set and there is a non-empty remainder. 

ARs: There are five subcases. In the first two, a remainder is allowed, which 
may occur only if we are on some multiplicative branch of the search. In the first, 
handled byAR, the bunch F and the stack s are given. Notice that the stack s is 
the same on both sides of the left-hand branch: modifications to the stack before 
passing to the next branch occur only in operators in which remainders are not 
permitted. Moreover, notice that the right-hand branch gets an empty stack: it 
has already all the information needed to search for a proof. For example, this 
case of the rule is used for the proof of {(p ^ * X and, indeed, 

the same program, with this rules, proves {ip A (j>) * y, {(p A (p) * y, {ip A ip) * y, 
etc.. In each case, the context given to the left-hand branch will be equivalent to 
{<p',ip),X and the subtraction, “R — Z\”, will leave y for the right-hand branch. 
The second case, AR* takes care of the case in which T is found as a goal in both 
conjuncts. Of course, the AR rules do not propagate the presence of T from the 
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Table 1. Summary of the Operational Semantics of BLP 



Unit®(n > 0) 



- TUnit 



(r|s)o\°(r|s) I 



Unit 



CutAxiom^ 



(^;r|3:s)„Y(£|{r,0}::s)h„/ 

(C;-T(a)|[|]::s)^"(A|s') ho a 



Unit'’ 



{r\s)X{e\s) ho I 
{r'\s)X'{^\s) ho I 

(r(a)|s)„Y'(Z\|s') ho a 

(r'|s)„\"(A|s') ho (e;0^a|[])o\"'(£|[])ho<^ {r'\s)^{A\s') h, I 

Jj ^ 1j 



CutAxiom't 



(r((?!n=a)|s)^’'(A|s') ho a (r(6>; <?i-^a)|s)^’*(A|s') ho a 

(rp::s)„Y(r'p::s) ho 0 {r'\s)X"{^W) ^ (0, r\s)XxW) ^ 






(-T|s)X (^|s') ho (U|s)X(^1sO l”o (l>^tp 

(r|s)„\"(A|s) ho 0 (r-A|[])oV (£|[]) ho 

(-r|s)^"(A|s) ho 4>A'<Ij 

{r\s)X\X) 4> {r\s)X'X\s) ho 






-ar 



(r|s)^"(A n A'|s) ho (i)A'tp 

(r|s)^"(£|s) ho </> (P|s)i\" (e|s) ho 

{r\s}^ (e|s) ho </>AV> 



A/?'(n', n" > n) 



-AR 



at 



(C;r|ps)„Y(£|{A.c}::s)ho<^ {^■,{r-A)\[]X"{s\[])K^ .. 

y rr, AR'"“ 

(£|{A,C}::s)hoM^ 



(C; j^iP:g)nV(£|pg) ho ^ 

(^;r|p:s)y"(£|{A,C}::s)ho0A^ 

X r|D)n\"'(£|D) V> (0 ; XX K ^ 

^ R ^ i?' 



(^l[])n\>|D)^.0^V> 






XeX\\MXXU{A{e,<P)}X ho V> (0;r|g:s}.\''(£|g:s)hoU ^ 



(S;r|@-::s)„\'(£|{A,a::s)ho0^V> 



(rp::s)^"+i(rp::s) ho 



(ris)XUk') ho 4> imX'iAs') ho i> 

VR ^ Vi? 



(r|s)^"(A|s') ho 0 V V’ 



(r|s)X"(-4|s^) ho 0 V v> 



t r' is obtained uniquely by performing the minimal number of Weakenings required 
to bring a, <f>^a or (6>; <jl— >a) to top-level. 

I If n' > n then A' = A — <f>, else A' = A. 
i If n',n" > n, then n'" = n + 1, else n'" = n. 

t] Here Q^::s indicates that here the theorem flag may be present or not. 
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left branch to the right branch. An increased counter should be passed on to the 
rest of the computation since any left over found later could have been given to 
the (j)A'ijj subproof and weakened away at this point. In the other three, handled 
byAR**, andAR“, no remainder is permitted. Here we have three subcases, 
depending on whether the stack is full on return of the left, right or neither 
branch. An increased counter will be passed to the rest of the computation only 
if T is found in both left and right subproofs. An example which includes one of 
these cases follows after the description of the operators. 

— >■ Rs: There are four subcases. All four share the property that the an- 
tecedent of the implication leaves no remainder. Note that we assume that any 
occurrences of A or * in the antecendents (f> vci (j) ^ ip which are (inductively) 
principal connectives are immediately removed using operators corresponding to 
the operationally trivial AL or*L rules. An example which illustrates —>■ R follows 
after the description of the operators. The last rule is used when the implication 
is an intuitionistic theorem, when it should behave in similar ways to T. The 
remaining cases are similar. In all cases, failure invokes backtracking. 

A worked example, as mentioned above, will clarify these complex construc- 
tions (here, “CA” denotes Cut Axiom and the counters are 0 throughout): 

Unit" Unit 

(<^@)\(£|W,x})b„7 (0™|D)\(£|D)bo7 ^ 

(x;V’ID)\(£|D) bo X ... 



{x-,{f,f’)M)\{e\{f,x}}^oPAx ^ (0™|D)\(£|D)bo7 

((0,V>)P)\(0P) ho X^(^Ax) (</'ID)\(£|D) bo <p 



((<?^, '*/')! D)\(£|D) bo (x^(V’Ax))*<?!> 

Thus we revisit our earlier, problematic example, showing how the use of boxes 
manages the interaction between the multiplicatives and additives, specially— >■ R. 

In order to establish the soundness and completeness, with respect to logical 
consequence, of the operational semantics, we need to relate intermediate states 
of a computation, i.e., operational sequents, to logical consequences, i.e., BI 
sequents. To this end we introduce a mapping \ from pairs of bunch-stack pairs, 
together with the goal, i. e., (R, s)))\”(R', s') p p, to bunches. The idea is to extend 
the basic subtraction operation, defined for bunches, to bunch-stack pairs. There 
are three cases. 

1. Remainders are not allowed and a full box is on the top of the downward 
stack: (^; R||]1:s)\(e|{Z\, ^}::s) = f{f); (R - A), where f{f) is ^ if p is an 
additive conjunction or implication, and is T otherwise. 

2. Remainders are not allowed and there is something other than a full box on 
top of the downward stack: (R|s)\(e|s) = R (= R — 0™). 

3. Finally, in case neither (1) nor (2): (R|s)\(Z\|s) = F — A. 

Then we get, by induction on the structure of proofs, the following: 
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Lemma 2. The stacks used in the operational semantics do not corrupt the 
logical consequence relation: 

(r, s)i^\r', s') ho 4> if and only if {T, s)\{r', s') h (j), 

where ifn>0, then T and T' may he sub-bunches of T and T' , and if n = Q, 
then r = r and T' = T' . □ 

Soundness and completeness follow as a corollary of Lemma 2. 

Theorem 1. The operational semantics is sound and complete with respect to 
the hunched sequent calculus, LBI; 

D) l"o if and only if r\-4>. □ 

We conjecture that our operational techniques will be applicable to a wide 
range of substructural logics. 

4 Programming in BLP: An Example 

Recall that our semantics for BI’s connectives, as set out in § 1, is couched in 
terms of sharing. Here, we give a quite generic, yet small, example of the type 
of problems for which bunched logic programming is well-suited. Consider the 
bunch (p{al); p{a2)) , {p{bl)] p{b2)) . Here, p{x) means “x is a person”. The bunch 
structure shows that al and a2 belong to the same group and that al and bl 
belong to beligerent groups. To say that two individuals are possibly in a fighting 
relation we say simply \/x, y.p{x)*p{y)^fight{x, y), which is to say that x and y 
may fight if they belong to different groups. A complete BLP program (we write 
on one line to save space) would be (here, T is T, the unit of A) 

(p(al) ;p(a2)) , (p(bl) ;p(b2) ) , [x,y]fight(x,y)*- p(x)*p(y)*T 

Notice that the definition of fight has been slightly modified to take into account 
that there might be more than two groups; but they may be disregarded. 

An alternative solution would be to decorate each group with a multiplicative 
unit to signal that it can be ignored. So we might have, for example, 

(p(al) ;p(a2) ; I) , (p(bl) ;p(b2) ; I) , (p(a5) ;p(a6) ; I) 

However, the first approach is to be recommended since it doesn’t produce re- 
dundant solutions. Adding a unit to each group allows the unit operation to be 
performed in different places, but without changing the solution. 

The following is an equivalent Prolog program for this problem. It uses tags 
to distinguish the groups: 

p(al,tl). p(a2,tl). p(bl,t2). p(b2,t2). 
fight(X,Y):- p(X,T),p(Y,U),T\=U. 
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Thinking of political parties, sometimes they split into rival factions but each 
faction in turn might want to keep its former allies. This situation might be rep- 
resented by the bunch (p(al); p(o2)), (p(61); (p(621); p(622)), (p(623); p(624))). 
Notice that 621 fights with al and o2 but also with 623 and 624. If we call x and 
y allies if they don’t fight, then despite 61’s being an ally of 621, and also of 623, 
621 and 623 are not allies. The modification of the BLP program to refiect this 
state of affairs is straightforward: 

(p(al);p(a2)), (p(bl); (p(b21) ;p(b22) ) , (p(b23) ;p(b24) ) ) , 

[x, y] fight (x,y)*- p(x)*p(y)*T 

Notice that the defining clause doesn’t need any modification. 

To modify the Prolog program we could start by adding an extra tag to 
reflect the structure of the problem like this 



p(al,tl,_). p(a2,tl,_). fight(X,Y):- p(X,T,_),p(Y,U,_),T\=U. 
p(bl,t2,_). fight(X,Y):- p(X,T,V),p(Y,U,W),T=U,V\=W. 

p(b21,t2,tl) . p(b22,t2,tl) . 
p(b23,t2,t2) .p(b24,t2,t2) . 

and we should be aware that the whole program has had to be modified to ac- 
count for the extra tag. Or a new, more flexible implementation may be dreamed 
up, like using lists of tags as a second argument: 



p(al , [tl] ) . 
p(a2, [tl] ) . 
p(bl, [t2] ) . 
p(b21, [t2,tl]) . 
p(b22, [t2,tl]) . 



fight(X,Y) :- p(X,U) ,p(Y,S) ,mismatch(U,S) . 

mismatchC [HI I _] , [H2 I _] ) : - H1\=H2 . 

mismatchC [HI |T1] , [H2 I T2] ) :- Hl=H2,mismatch(Tl,T2) . 

p(b23, [t2,t2]) . p(b24, [t2,t2]) . 



Please compare the heavy machinery used in this example with the simplicity of 
the BLP version. 

The bunched structure also helps to give fine control over the scope of pred- 
icates, specially implications. In the example above, we can think of a variety of 
ways in which constants can be predicated. For example o2 might be a special 
kind of person. It would be possible to modify the program in the following way: 



(p(al) ;q(a2) ; [x]p(x) <- q(x)), 

(p(bl) ; (p(b21) ;q(b22) ) , (p(b23) ;p(b24) ) ) , 

[x, y] fight (x,y)*- p(x)*p(y)*T 

Now this program says that o2 is a g but also that all qs are ps. However, this 
relation between ps and qs holds only for the group formed by al and a2, i.e., 
is local to that world. Other qs appearing in other places in the program, for 
example 622, will not be picked up by the local implication, or— (which matches 
the combining p(al) and q{a2)). 

The language BLP has been implemented, in the continuation-passing style, 
by Armelm using the OCaml system [3]. 
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Abstract. Skepticism is one of the most important semantic intnitions 
in artificial intelligence. The semantics formalizing skeptical reasoning 
in (disjunctive) logic programming is usually named well-founded se- 
mantics. However, the issue of defining and computing the well-founded 
semantics for disjunctive programs and databases has proved to be 
far more complex and difficult than for normal logic programs. The 
argumentation-based semantics WFDS is among the most promising 
proposals that attempts to define a natural well-founded semantics for 
disjunctive programs. In this paper, we propose a top-down procedure 
for WFDS called D-SLS Resolution, which naturally extends the Global 
SLS-resolution and SLI-resolution. We prove that D-SLS Resolution is 
sound and complete with respect to WFDS. This result in turn provides 
a further yet more powerful argument in favor of the WFDS. 



1 Introduction 

Disjunctive logic programming (DLP) has gained wide acceptance as an impor- 
tant tool for knowledge representation. One critical reason is that DLP is more 
expressive and natural to use than normal (i.e. non-disjunctive) logic program- 
ming. The additional expressive power allows direct encodings of a great number 
of application domains into logic programs. However, the issue of defining and 
computing semantics for disjunctive programs and databases has proved to be 
far more complex and difficult than for normal logic programs. The skepticism 
and credulism are two major semantic intuitions for knowledge representation. A 
skeptical reasoner does not infer any conclusion in uncertainty conditions while 
a credulous reasoner tries to give conclusions as much as possible. Therefore, 
a skeptical reasoner usually get more feasible conclusions. In normal logic pro- 
gramming, these two opposite semantic intuitions are suitably captured by the 
well-founded semantics [11] and the stable semantics [6], respectively. There has 
already been a widely accepted stable semantics for disjunctive programs [9]. 
To date, there is no widely accepted well-founded semantics for DLP and no 
consensus has been reached about what constitutes an intended semantics for 
skeptical reasoning in DLP. Based on a comparative study of some recent ap- 
proaches to defining well-founded semantics for disjunctive programs in [2,9,5, 
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7,12], it has been proved in [13] that these approaches become equivalent when 
some “minor” modifications are made on them. Specifically, there exists a seman- 
tics (i. e. WFDS*) for well-founded reasoning in DLP which can be equivalently 
characterized by argumentation, program transformations and unfounded sets. 

In the same style as the D-WFS defined in [1,2], a bottom-up computation 
procure has also been provided in [13]. In this paper, we investigate the problem 
of top-down computation for disjunctive well-founded semantics. Specifically, 
we propose a top-down procedure for disjunctive well-founded semantics called 
D-SLS Resolution, which naturally extends the Global SLS-resolution and SLI- 
resolution. We prove that D-SLS Resolution is sound and complete with respect 
to WFDS*. 

Since logic programming is essentially goal-oriented, the existence of an ele- 
gant top-down procedure is surely a significant feature for query answering under 
any semantics. Our results in turn provide further yet more powerful arguments 
in favor of the semantics WFDS*. 

The paper is organized as follows. In the next section we briefly recall re- 
lated definitions in logic programming and specify our notations. In Section 3 
we give the argumentative definition of the semantics WFDS*. Then in Sec- 
tion 4 we present the D-SLS Resolution procedure. Our procedure is not only 
a combination of Ross’s Global SLS-resolution and SLI-resolution, it also ele- 
gantly incorporates the intuition of resolving default negation with disjunctive 
information. To illustrate our resolution procedure and its relation to some other 
semantic intuitions, two examples are given in Section 5. In Section 6 we state 
the soundness and completeness of D-SLS Resolution with respect to WFDS*. 
Finally, in Section 7 we conclude the paper. 



2 Preliminaries 

We assume the existence of an arbitrary, but fixed propositional language, gen- 
erated from a selected set of propositional symbols (atoms). An expression (dis- 
junction, formula, rule, or set of rules, etc) with variables is understood as an 
abbreviation for the set of all its grounded instances. If S is an expression, 
atoms{S) denotes the set of all atoms appearing in S. A general disjunctive logic 
program (simply, disjunctive program) P is defined as a finite set of rules of the 
form: 



PiV ■■■Mpt ^ pt+i,. .. ,Ps,not ps+i,. . . ,not pn. (1) 

Here, n > s > t > 0 and pi’s are atoms for i = 1, . . . , n. The symbols ‘V’ and 
^not' denote (non-classical) disjunction and default negation, respectively. 

A literal is either an atom p or its default negation not p while not p is called 
a negative literal. 

The informal meaning of rule (1) is that “if pt+i,...,ps are true and 
Ps+i, ■ ■ ■ ,Pn are all not provable, then one of {pi, . . . ,pt} is true”. For example, 
male(greg) V female(greg) animal{greg),not ab{greg) means, informally. 
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that if greg is an animal and it is not provable that greg is abnormal, then greg 
is either male or female. 

If t = 1, rule (1) is said to be normal. P is a normal program if each rule of 
P is normal. 

If n = s, rule (1) is said to be positive. P is a positive disjunctive program if 
each rule of P is positive. 

If t = s, rule (1) is said to be negative. P is a negative disjunctive program if 
each rule of P is negative. 

As usual, Bp is the Herbrand base of disjunctive program P (i. e. the set 
of all ground atoms in P). A positive (negative) disjunction is a disjunction of 
atoms (negative literals) in P. A pure disjunction is either a positive one or a 
negative one. The disjunctive base of P is DBp = DBp U DBp where DBp 
is the set of all positive disjunctions in P and DBj^ is the set of all negative 
disjunctions in P. If a and (3 = a\/ a' are two disjunctions, then we say a is a 
sub- disjunction of (3. 

A model state of disjunctive program P is a subset of DBp. Usually, a well- 
founded semantics for disjunctive logic programs is defined as a mapping such 
that each disjunctive program P is assigned a model state. 

For simplicity, we also express a rule of form (1) as U ^ Ui, not. II 2 , where E 
is a disjunction of atoms in Bp, Ui a finite subset of Bp denoting a conjunction 
of atoms, and not.Il 2 = {not q \ q € II 2 } for II 2 C Bp denoting a conjunction 
of negative literals. 

3 Skeptical Argumentation 

As illustrated in [12], argumentation can be used to define a unifying semantic 
framework for DLP. In this section, we first briefly recall the well-founded exten- 
sion semantics WFDS in [12] and give a minor modification WFDS* of WFDS. 
The basic idea of the argumentation-based approach for DLP is to translate each 
disjunctive logic program into an argument framework Fp = (P, DBp,'^p). 
Here, an assumption of P is a negative disjunction of P, and a hypothesis is a set 
of assumptions; '^p is an attack relation among the hypotheses. An admissible 
hypothesis A is one that can attack every hypothesis which attacks it. 

The intuitive meaning of an assumption not ai V • • • V not am is that oi A 
• • • A am can not be proved from the disjunctive program. 

Given a hypothesis A of disjunctive program P, similar to the GL- 
transformation [6], we can easily reduce P into another disjunctive program 
without default negation. 

Definition 1. Let A be a hypothesis of disjunctive program P, then the reduct 
of P with respect to A is the disjunctive program 

Pjj = {A -(r- B I there is a rule of form A ^ B, not.C in P s.t. not.C C A}. 

Based on Definition 1, we will first introduce a special resolution hp which 
resolves default-negation literals with a disjunction and can be intuitively illus- 
trated by the following principle: 
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If there is an agent who holds the assumptions not bi, . . . , not bm and ean 
infer the disjunctive information 6i V • • • V 6m V 6m+i V • • • V 6„, then the agent 
should be able to infer bm+i V • • • V 6„. 

The following definition precisely formulates this principle in the setting of 
DTP. 

Definition 2. Let A be a hypothesis of disjunctive program P and a G DBp. If 
there exists [3 G DBp and not 6i, . • • , not bm G A such that /3 = a V 6i V • • • V 6m 
and h [3. Then A is said to be a supporting hypothesis for a, denoted Arpa. 
Here h is the classical inference; Pj^ is considered as a classical logic theory while 
[3 is considered as a formula in classical logic. 

The consequence set of A consists of all positive disjunctions that are sup- 
ported by A: 

consp(A) = {a \ a G DBp, AGpa}. 

For example, if P = {a V 6 ^ c, not d; c and A = {not 6, not d}, then 
A\-p a. 

The task of defining a semantics for a disjunctive logic program P is to de- 
termine the state that can represent the intended meaning of P. Here we first 
specify the negative information in the semantics and then derive the positive 
part. To derive suitable hypotheses for a given disjunctive program, some con- 
straints will be required to filter out unintuitive hypotheses. 

Definition 3. Let A and A' be two hypotheses of disjunctive program P. If at 
least one of the following two conditions holds: 

1. there exists (3 = not 6i V • • • V not bm G A', m > 0, such that Ahpbi, for 
all i = 1, . . . , m; or 

2. there exist not 6i, . . . , not bm G A' ,m > 0, such that AGpbi V ■ • ■ V bm, 
then we say A attacks A', and denoted A^p A'. 

Intuitively, A '^p A' means that A causes a direct contradiction with A' 
and the contradiction may come from one of the above two cases. 

Example 1. 



aV b ^ d 

c ^ d, not a, not 6 
dG- 

e ^ not e 

Let A' = {not c} and A = {not a, not 6}, then A^p A' . 

The next definition specifies what is an acceptable hypothesis. 

Definition 4. Let A be a hypothesis of disjunctive program P. An assumption 
(3 G DBf, is admissible with respect to A if A^p A' holds for any hypothesis A' 
of P such that A' ^p {(3{. 

Denote Ap{A) = {a G DBf, \ a is admissible with respect to A}. 
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For any disjunctive program P, Ap is a monotonic operator: A Q A' implies 
Ap{A) C Ap(A') for any two hypotheses A and A' of P. Thus, Ap has the least 
fixpoint lfp{Ap). Since Bp is finite in this paper, the fixpoint can be obtained in 
finite steps by iterating Ap from the emptyset. That is, lfp{Ap) = Ap{^) where 

Definition 5. The well-founded disjunctive hypothesis WFDH{P) of disjunc- 
tive program P is defined as the least fixpoint of the operator Ap. That is, 
WFDH{P) =Apfuj. 

The well-founded extension semantics WFDS for P is defined as the model 
state WFDS{P) = WFDH{P)\J consp{WFDH{P)). 

By the above definition, WFDS(P) is uniquely determined by WFDH(P). 
For the program P in Example 1, WFDS(P) = {aV6, d, not c, not aV not b}. 
To compare with different semantics, Ap can be modified by defining 

Ap{A) = {/3 G DBfi I not q is admissible w.r.t. A for some literal not q in fi}. 

Parallel to the definition of WFDS, we can get a new well-founded semantics 
denoted WFDS* for disjunctive programs, which is a modification of WFDS. 
For instance, let P be the program given in Example 1, then WFDS*(P) = 
{a V b, d, not c}. Now not a V not b is no longer in WFDS*(P). 

We can prove that WFDS is no less strong than WFDS* in the following 
sense. 

Proposition 1. For any disjunctive program P and a G DBp, if a G 
WFDS*{P), then a G WFDS{P). 

As we have seen above, the converse of this proposition is not true in general. 
Specifically, WFDS allows more negative disjunctions to be inferred. However, 
this is not a big difference as the following results show. In fact, except for this 
difference, these two semantics coincide. 

Proposition 2. Let A be an admissible hypothesis of disjunctive program P. If 
(3 G DBfi but is not a literal, then 

1. a hypothesis a is admissible w.r.t. A iff it is admissible w.r.t. A — {fi}. 

2. for any positive disjunction D, D G consp(A) iff D € consp{A — {/?}). 

An interesting result is the equivalence of WFDS and WFDS*. 

Theorem 1. Let P he a disjunctive program. Then 

1. not p G WFDS*{P) iff not p G WFDS{P) for any atom p. 

2. a G WFDS*{P) iff a G WFDS{P) for any positive disjunction a. 

This theorem convinces that the difference of WFDS* from WFDS is only in 
that they derive different sets of true negative disjunctions. 
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4 D-SLS Resolution 

In this section, we will define a top-down procedure, called D-SLS Resolution, for 
disjunctive well-founded semantics. This procedure combines the idea of Global 
SLS-resolution in [10] with a linear resolution procedure. The linear resolution 
is a generalization of the SLI-resolution presented in [8]. One key part in our 
procedure is the incorporation of resolving default negation with disjunctive in- 
formation into SLS-resolution. D-SLS Resolution will be based on the notion of 
D-SLS tree, which in turn depends on the notion of positive trees. In the next 
section, we prove that D-SLS Resolution is sound and complete with respect 
to the disjunctive well-founded semantics WFDS and WFDS*. To achieve com- 
pleteness of D-SLS Resolution, we adopt the so-called positivistic computation 
rule, that is, we always select positive literals ahead of negative ones. 

A goal G is of the form ^ D\,. . . , Dr, ~<bi, . . . , -ibm, not ci, . . . , not c„, where 
each Di is a positive disjunction; all bi and Ci are atoms. To distinguish from 
default literals, we shall say that I is a classic literal ii I = p or I = -•p. 

In our resolution-like procedure, given a rule C : S -(r- Ui, not.112, we trans- 
form C into a goal gt{C) : ^ ->S, Ui, not. II 2 and call it the goal transformation 
of C, where ->D = {-■p | p G S}. 

Since our resolution is to resolve literals in both heads and bodies of rules, this 
transformation allows a unifying and simple approach to defining resolution-like 
procedure for disjunctive logic programs as we shall see. 

The special goal ^ is called an empty goal. The empty goal ^ is also written 
as the familiar symbol □. A non-empty goal of form ^ ->E,not.II is said to be 
a negative goal. 

Given a disjunctive program P, set gt{P) = {gt{C) : C G P}. The tradi- 
tional goal resolution can be generalized as follows. 

Disjunctive Goal Resolution (DGR) If G : 5i V • • • V 6^, Gi and G' : ^ 

-<bi, . . . , ->bs, G 2 are two goals with s < r, then the DGH-resolvent of G with 
G' on selected disjunction 61 V • • • V fej. is the goal ^ Gi, G2. 

It should be noted that resolution rule DGR incorporates several resolution rules 
including Goal resolution. Ancestor resolution and Body literal resolution [8,15]. 
Since we allow positive disjunctions in goals and the resolution rule, DGR is 
more powerful than the above mentioned three resolution rules as the following 
example shows. 

Example 2. Let P be the following disjunctive program: 

a\f b i — 

c ^ not a, not b 

Then P can be transformed into the following set gt{P) of goals: 

~>a, ~>b 

-'C, not a, not b 
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Then the DGR-resolvent of G : c with the second goal is ^ not a, not b, 

which can be obtained by the ordinary goal resolution; however, the DGR- 
resolvent of goal a y b with the first goal is □, which can not be obtained by 
any of those three resolution rules. 

Definition 6. Let P be a disjunctive program and G a goal. A positive tree Tjt 
for G is defined as follows: 

1. The root ofTQ is G. 

2. For each node G' : -•S' ,p, n[,not.Il 2 , and each goal Gi in gt{P), if Gi 

is the T)GLi-resolvent of G' with Gi on p and G' is different from all nodes 
in the branch of G' , then G' has a child G'. 

A node labeled ^ ~'Pi, • ■ • , ~'Pn, not Pn+i, ■ ■ ■ , not Pm{m > n > 0) is called 
an active node. 

Thus, an active node is either the empty goal or a negative goal. For a 
negative goal, its success/failure has to be decided in subsequent stages. Now we 
define the D-SLS tree for a goal in terms of positive trees. 

Definition 7. (D-SLS Tree) Let P be a disjunctive logic program and G a 
goal. The D-SLS tree Pq for G is a tree whose nodes are of two types: negation 
nodes and tree nodes. Tree nodes are actually positive trees for intermediate 
goals. The nodes of Pq is defined inductively as follows: 

1. The root of Pq is the positive tree Tq for the goal G. 

2. For any tree node T)j of Pq, The children of T^ are negation nodes, one 
corresponding to each active leaf of T^ (there will be a negation node corre- 
sponding to an empty active leaf). 

3. Let J he a negation node corresponding to the active leaf ^r- Q where Q = 

{not qi,...,not (/„} and n > 0. J is denoted Q). Then, if n > 0, J 

has one child which is the positive tree • 

We distinguish three types of leaves in a D-SLS tree (successful nodes, failed 
nodes and intermediate nodesj according to the following rules. Successful and 
failed nodes also have an associated level. 

1 For negation node J , 

(a) if the child of a negation node J is a successful tree node, then we say 
J is failed. The level is the level of its successful child. 

(b) if the child of a negation node J is a failed tree node, or if J has no 
children, then we say J is successful. The level of J is the level of the 
child of J (if J has no children, the level of J is Q). 

2 For tree node T, 

(a) if every child of a tree node T is a failed negation node, or ifT is a leaf 
of Fa (i. e. T has no active leaves) then we say T is failed. T has the 
level 1 if T is a leaf; the level ofT is k-\-l if the maximum level of levels 
of the children ofT is k. 
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(b) if some child of a tree node T is a successful negation node, then we say 
T is successful. A non-root tree node T has level k 1 if the minimum 
level of all its successful children is k. The root tree node may have several 
associated levels, one for each successful child; the level of the root tree 
node with respect to such a successful child is one more than the level of 
the child. 

3 We say a node is well determined if it is either successful or failed. Otherwise, 
we say the node is indeterminate. 

Let L be an active leaf of a tree node in Tq. We may say that L is successful 
(resp. failed or indeterminate) if the corresponding negation node is successful 
(resp. failed or indeterminate). We may also say that the goal G is successful 
(resp. failed or indeterminate) if Tq is successful (resp. failed or indeterminate). 
Compared to Global SLS-resolution for normal logic programs, D-SLS Resolu- 
tion has the following major features: 

1. the underlying reasoning mechanism for D-SLS Resolution is a gener- 
alization of SLI-resolution while the underlying reasoning mechanism for SLS- 
resolution is SLD-resolution; 

2. each negation node has just one child in D-SLS Resolution while a negation 
node may have several children in Global SLS-resolution because disjunctions 
are allowed now. This makes a simpler form of D-SLS Resolution. 

To guarantee the termination of D-SLS Resolution, we also assume that 
every node is not repeated. That is, whenever a repeated node in D-SLS tree is 
found, the extending of the tree will be stopped. 

5 Examples 

Let us look at some illustrating examples. 

Example 3. Consider the following disjunctive program P\ 

a\/ b ^ c, not d 
e ^ not a, not b 
e ^ not a, g 
c ^ 

a ^ not c 

It can be verified that not e G WFDS*(P) and thus not e G WFDS(P). 

Now let us see how not e is inferred by D-SLS Resolution. 

First, P is transformed into gt{P) which consists of the following goals: 

Gi : ^ -la, -<b, c, not d 
G 2 ■ ^ ~<e, not a, not b 
G 3 : ^ -le, not a, g 
G 4 : •<— ~>c 
G 5 : -la, not c 

In fact, we have the following D-SLS tree P<_e for ^ e: 
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T+ 

Ni{<— not a, not b) 
T+ 






N2{<— 

T, 

The positive tree for t— e is as follows: 



jt d) 



A3(<— not c) 



T+ 




aVb 




^ e 

■(r- not a, not b ^ not a, g 

The positive tree 

i — (x\/ b 

■<r- c, not d not c 

G4 

^ not d 

The positive tree consists of only the root node ^ d. 
The positive tree T+^ is 







^ c 
G4 

t— 
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By Definition 7, 

is a leaf of D<_e 

is a failed tree node =J> 

N 2 is a successful negation node (no matter what N 3 is) 
the tree node is successful ^ 

Ni is a failed node (and Ni is the only negation child of T^g) 

T+e is a failed (root) node 
the goal ^ e is failed. 

To guarantee the termination of D-SLS Resolution, we also assume that every 
node is not repeated. That is, whenever a repeated node in D-SLS tree is found, 
the extending of the tree will be stopped. For example, if we replace the last rule 
a not c in the above example with the rule a not a, not b, then N 3 = Ni 
and thus N 3 and its children (if any) will be deleted from /Vg. 

It should be noted that D-SLS Resolution is different from the SLIN- 
resolution [14]. We demonstrate this by the following example. 

Example 4- Let P consist of two rules: 

a\f b i — 

c ^ not a, not b 



Although not c G WFDS*(P), c is indeterminate with respect to the SLIN- 
resolution. This means that SLIN-resolution is not complete for the disjunctive 
well-founded semantics WFDS*. However, D-SLS tree /Vc for the goal ^ c is 
failed (to save space, the tree is figured in one line because each of its internal 
nodes has the unique child): 

T+g — not a, not b) — T+„v6 — ^ 2 (^) — T+ 

Here, T+ has the unique node and follows, respectively: 



c 



i — a \/ b 



C?2 



Gi 



^ not a, not b 



G- 



It is easy to see that T+a is an indeterminate node of the D-SLS tree for the 
goal ^ a. 

It should be noted that, in D-SLS tree, an active node containing classic negative 
literals can not be ignored^. That is, there may be negation node having classic 
negative literals as child in D-SLS tree. Consider the following program: 

by I G- not p 
ly p G- 

^ This question is proposed by one referee. 




A Top-Down Procedure for Disjunctive Well-Founded Semantics 



315 



The D-SLS tree /V;, for the goal 6 is as follows: 

T+f, — N{<- not p) — 

It can be verified that the goal ^ 6 is failed. 

6 Soundness and Completeness of D-SLS Resolution 

In this section, we address the soundness and completeness of D-SLS Resolu- 
tion. We first show that D-SLS Resolution is sound and complete w.r.t. the 
argumentative semantics WFDS. Then, by Theorem 1, we get the soundness 
and completeness of D-SLS Resolution w.r.t. WFDS*. Although we allow a 
goal to have a very general form in our D-SLS Resolution, each goal G consid- 
ered in this section actually has one of the two forms: either •<— ai V . . . V or 
^ oi, . . . , Or, ~'bi , . . . , -'bm, not Cl, ... , not Cn, where all Ui, bi and Ci are atoms. 
Thus, from now on we will always mean either of the above form when a goal 
is mentioned. The detailed proofs of results in this section are not difficult but 
tedious, thus we omit them here. 

Theorem 2. (Soundness of D-SLS Resolution w.r.t. WFDS) 

Let P be a disjunctive logic program. Then 

1. If goal G : ^ qi, . . . , Qn is failed, then not V • • • V not q„ € WFDS(P). 

2. If goal G : ^ ( 7 i V • • • V is successful, then V • • • V G WFDS{P). 

To prove Theorem 2, we need only to show the following lemma. 

Lemma 1. Let P be a disjunctive logic program. Then 

1. If goal G : ^ qi, . . . , qn is failed, then not V • • • V not qn G WFDH{P). 

2. If goal G : ^ ( 7 i V • • • V is successful, then WFDH{P) \-p gi V • • • V g„. 

Sketch of Proof It suffices to prove the following two propositions hold by 
using simultaneous induction on the level 1{Fg) of D-SLS tree Fg- 

51 not < 7 i V • • • V not qn G Ap(0) if the goal G : ^ qi, . . . ,qn is failed and 
KFg) = k>l; 

52 Ap(0) hp V • • • V if the goal G : ^ V • • • V is successful and 
KGg) = k>0. 



Theorem 3. (Completeness of D-SLS Resolution w.r.t. WFDS) 

Let P be a disjunctive logic program. Then 

1. If qi V ■■■ V qn € WFDS{P), then the goal qi V • • ■ V qn is successful. 

2. If not < 7 i V • • • V not qn G WFDS{P), then the goal G gi, . . . , is failed. 

This theorem follows directly from the next lemma. 

Lemma 2. Let P be a disjunctive logic program and G a goal of P. Then 
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1. If WFDH{P) \-p < 7 i V • • • V qn, then the goal G : ^ qiV ■ ■ ■ V q„ is successful. 

2. If not gi V • • • V not € WFDH{P), then the goal G : ^ qi, ... ,qn is failed. 

Sketch of Proof It is enough to show that both of the following Cl and C2 
hold by using simultaneous induction on the level 1{Fg) = k: 

Cl For fc > 1, if A^“^(0) hp V • • • V then the goal G : ^ < 7 i V • • • V is 
successful and its level is k. 

C2 For fc > 1, if not gi V • • • V not G Ap(0), then the goal G : ^ qi, . . . ,q„ 
is failed and its level is no more than fc + 1. 

By the definition of WFDS*, for any atoms qi, . . . , not gi V • • • V not qn G 
WFDS*(P) if and only if qi G WFDS*(P) for some qi. Thus, the following two 
theorems follows directly from Theorem 1 and the two theorems above. 

Theorem 4. (Soundness o/D-SLS Resolution w.r.t. WFDS^ ) 

Let P he a disjunctive logic program. Then 

1. If goal G : qi is failed for some qi, then not qiV ■ ■ -Vnot q„ G WFDS*{P). 

2. If goal G : ^ qi V ■■■ V qn is successful, then V • • • V G WFDS*{P). 



Theorem 5. (Completeness of D-SLS Resolution w.r.t. WFDS* ) 

Let P he a disjunctive logic program. Then 

1. If qi V ■■■ V qn € WFDS*{P), then the goal ^ qi V ■ ■ ■ V q„ is successful. 

2. If not gi V • • • V not qn G WFDS* (P) , then the goal G qi is failed for some 

%• 

7 Conclusion 

The main contribution of this paper is that we have proposed a top-down proce- 
dure D-SLS Resolution for disjunctive well-founded semantics. This resolution- 
like procedure extends both the Global SLS-resolution [10] and SLI-resolution [8]. 
We prove that D-SLS Resolution is sound and complete with respect to the dis- 
junctive well-founded semantics WFDS and WFDS*. We know that the Global 
SLS-resolution is a classic procedure for the well-founded semantics of normal 
logic programs while SLI-resolution is the most important procedure for posi- 
tive disjunctive programs. D-SLS Resolution is actually a novel characterization 
for WFDS* and thus provides a further yet powerful argument in favor of the 
semantics WFDS*. On the other hand, the results in this paper pave a promis- 
ing way to implement the WFDS* by employing some existing theorem provers. 
Although this point has been made clear for Brass and Dix’s D-WFS in [3], 
no top-down procedure is provided for their semantics. It is worth noting that 
D-SLS Resolution in the current form is not efficient yet. We are currently 
working on more efficient algorithm for D-SLS Resolution by employing some 
techniques such as the tabling method [4]. 
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Abstract. Circumscription is naturally expressed in second-order logic, 
but previous implementations all work by handling cases that can be re- 
duced to first-order logic. Making use of a new second-order unification 
algorithm introduced in [3], we show how a theorem prover can be made 
to find proofs in second-order logic, in particular proofs by circumscrip- 
tion. We work out a blocks- world example in complete detail and give the 
output of an implementation, demonstrating that it works as claimed. 



1 Introduction 

Circumscription was introduced by John McCarthy [11] as a means of formalizing 
“common-sense reasoning” for artificial intelligence. It served as the foundation 
of his theory of non-monotonic reasoning. The essential idea is to introduce, 
when axiomatizing a situation, a predicate ab for “abnormality” , and to axiom- 
atize the ab predicate by saying it is the least predicate such that the other 
axioms are valid. Some other predicates may be allowed to “vary” in the mini- 
mization as well. There are several technical difficulties with McCarthy’s idea: 
First, the circumscription principle is most naturally expressed in second-order 
logic, where we have variables over predicates of objects. Second, unless the rest 
of the axioms contain ab only positively, the circumscription principle is not an 
ordinary inductive definition, and there may not even be a (unique) least solu- 
tion for the ab predicate, so the circumscription principle can be inconsistent. 
McCarthy’s ultimate goal was implementation of software using the circumscrip- 
tion principle to construct artificial intelligence. Believing that implementation 
of second-order logic was not a practical approach, many researchers have tried 
various methods of reducing special cases of the circumscription principle to 
first-order logic; see [7] for a summary of these efforts. Some of these reductions 
were in turn implemented. 

In this paper we take the other path, and exhibit a direct implementation of 
second-order logic which is capable of handling some circumscription problems. 
The key to making this work is a new notion of second-order unification. This 
notion of unification was introduced in [3], where some theorems about it are 
proved. In that paper, I pointed out the possibility of converting your favorite 
first-order theorem prover to a second-order theorem prover by adding second- 
order unification. This paper shows explicitly how this can be done, and that the 
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resulting second-order prover can indeed find circumscription proofs. Note that 
it would already be interesting if the resulting proof-checker could accept and 
verify circumscription proofs, but the essential point of this paper is that the use 
of the new unification algorithm of [3] enables a simple theorem-prover to find 
circumscription proofs by itself. The hard part of this, of course, is finding the 
correct values of the second-order predicates involved. These are generally give 
by A terms involving an operator for definition by cases. It is therefore essential 
to use a formalization of second-order logic which has terms for definition by 
cases. 

A longer version of this paper is available on the Web [1]. It includes ad- 
ditional background and details, and the complete computer-produced proof 
of the example treated here. In particular the exact syntax of our version of 
second-order logic, including application terms and lambda terms, is given there. 
Second-order unification, and its application to circumscription, both depend on 
the use of conditional terms, or case-terms. These are terms of the form 

f P{x) if X = y 

I Q{x) ow 

There are several different notations for such terms in use, including the form 
used in the C and Java programming languages: 

X = y 1 P{x) : Q{x) 

and the form used in [3] and in the theories of Feferman [2] : 

d{x,y,P{x),Q{x)). 

The form with d is the one in the official syntax, but the other two forms are 
both more readable. The syntax used by our computer implementation allows a 
more general kind of case term in which there can be several cases, instead of 
just one, before the “otherwise” term. For representing such terms the notation 
with a brace is more readable, so the output of the prover is presented in that 
notation. For writing papers, the notation with a question mark is more compact 
and equally readable, so we will use it in the paper. 

We use a Gentzen-sequent formulation of second-order logic. We simply take 
the usual Gentzen rules (e.g. G3 as in [9]) for both predicate and object quan- 
tifiers. The G3 rules need to be supplemented with rules corresponding to the 
formation of A-terms and ap-terms, as well as with rules corresponding to the 
introduction of case terms in both antecedent and succedent. We do not repeat 
the G3 rules here, but here are the other rules: 

t = s, A=j>(7, P t ^ s, B, r 

d{t,s,A,B)),P^C 
t = 



P^d{t, s, A, B)) 
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t ^ s^B 
r^d(t, s, A, B)) 

r ^A[t/x] 

r^{Xx.A)t 

r, A[t/x]^(j) 
r, {Xx.A)t^(j) 

We will be using the notion of unification introduced in [3]. This definition 
is also reviewed in [1], where it is also compared to Huet’s notion. 

2 Circumscription 

If U and V are predicate expressions of the same arity, then U < V stands for 
Vx(C/(x) — >■ V{x)). liU = Ui,. . . ,Un and V = Vi, . . . , 14 are similar tuples of 
predicate expressions, i.e. Ui and Vi are of the same arity, 1 < i < n, then U <V 
is an abbreviation for A”^QC/j < Vi. We write U = V iov U < V f\V < U , and 
U <V for U <V A^V <U. 

Definition 1 (Second-Order Circumscription). Let P be a tuple of distinct 
predicate constants, S he a tuple of distinct function and/or predicate constants 
disjoint from P, and let T{P; S) be a sentence. The second-order circumscription 
of P in T{P;S) with variable S, written Circ{T; P; S), is given in [7] as 

T{P- S) A T) AT <P] 

where <L> and T are tuples of variables similar to P and S, respectively. This can 
equivalently he stated in the form 

T{P; S) A \fm[T{d>, T) AT < P ^ P <T], 

which is the form our prover uses. 

3 Blocks World Example 

We treat the first example from [7] as a typical circumscription problem. 

Let P{Ab,On) be the theory 

c/b A -'On(c) A 'ix{-^ah{x) -A On{x)) 

where the variables range over “blocks” and On{x) means “x is on the table”. 
Circumscription enables us to conclude that a is the only block not on the table. 
For simplicity, we first consider the problem without the predicate B, i.e. we 
assume all variables range only over blocks. The idea is that normal blocks are 
on the table, and since c is the only abnormal block, 6 is a normal block and 
hence is on the table. Circumscription should enable us to prove On{b). 
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Circumscription in this example is taken to minimize ab with variable On, 
so in the general schema above, we take P to be ab and S to be On. 



c^b ( 1 ) 

'ix{-^ab{x) — >■ On{x)) (2) 

~^On{c) (3) 

— >■ ^(x)) A ~^<P{c) t\ P < ab ^ ab < 'P] (4) 



We first present a human-produced proof, for later comparison to the proof 
found by our program. We take as the goal to prove On(b). Backchaining from 
(2) produces the new goal -•ab{b). The human then suggests the values 

P = Xx.{x = cl true : false) (5) 

= Xx.{x = cl false : true) (6) 

With these values of and P, we want to prove ab{b) — >■ false, so we need to 
verify ■0(6) = false. But 0(6) = {b = c 1 true : false), and b = c evaluates to 
false since 6 0 c is in the antecedent, so 0(6) evaluates to false. It therefore 
suffices to verify the hypothesis of (4), namely 

\/x{-<P{x) — >■ <P{x)) A -<P{c) AP < ab. 

Fix an x, and suppose ~^P{x). Then x ^ c, from which (p{x) follows, which proves 
the first conjunct. The second conjunct, -~'P{c), follows immediately by reduction 
to true. The third conjunct, P < ab, is proved as follows: suppose P{x). Then 
x = c and so we must prove ab{c). But by (3) we have -'On(c), and so by (2) we 
have ab{c). That completes the proof. 

We now explain how the prover attacks this problem. We want to prove 
-•ab{b). (Officially that goal is the succedent of a sequent whose antecedent is the 
list of axioms.) So the prover assumes o6(6), and the new goal is a6(6)=4>false. (Of 
course officially the axioms should appear in the antecedent of the goal sequent, 
too, but we do not write them.) This causes (3) to be “opened up”, introducing 
metavariables P and Q. The formula o6 < Q is really yw{ab{w) — > Q(w)), 
so a metavariable W is introduced for w as well, but soon it is instantiated 
to 6 to unify ab{b) with ab{W), in the hopes of proving ab{W)=^Q{W) from 
o6(6)^false. Thus the prover tries to unify Q(6) with false. This gives 

Q = Xy.{y = bl false : Y{y)) 

where T is a new variable. The next goal is the conjunction of the three formulae 
on the left of the implication in 4. These are taken in order; the first one is 
Vu(-'Q(u) — >■ P(u)). Fixing v the goal is ~'Q(?^) -A P(u); writing out the current 
value of Q and 0-reducing, the goal is 

-'{v = bl false : Y{v)) P(u) 

There is a simplification rule for pushing a negation into a cases term, namely 
-<{v = p 1 q : r) = (v = p 1 -<q : ~ir). 
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So the goal becomes 



(v = b 1 true : -iF(t!)) — >■ P(f). 

This is solved by second-order unification, taking 

P = Xv.{{v = bl true : -iF(ri)) V Zv). 

The next goal is ~'P(c). That is, after a beta reduction, 

-■((c = 6 ? true : -iF(c)) V Z{c))). 

Now we can apply a simplification rule using the axiom c b, reducing the 
cases term to -•¥ (z) and hence the whole goal to (c) V Z{c)). Using rewrite 
rules appropriate to classical logic we simplify this to Y (z) A-'Z{c). Splitting the 
conjunction into two subgoals, the first one to be proved is Y{c). This is solved 
by second-order unification, taking 

Y = \u.{u = c? ab{b) : ^(u)) 

where A is a new metavariable. You might think we should get true in place of 
ab{b) in the value of Y , but when the prover has to prove a goal of the form Y (c), 
it does not try to unify Y (c) with true, but rather with one of the assumptions 
(formulas in the antecedent). It tries the most recently-added ones first, and it 
finds ab{b) there, which explains the value given for Y . 

The second goal is -^Z{c). Then Z{c) is assumed, leading to a goal 
Z(c)=^false. Unifying Z{c) with false gives Z the value 

Z = Ar.(r = c ? On{c) : B{r)), 

where i? is a new metavariable. Again, you might expect false to occur in place of 
On{c) in the value of Z, but the prover finds the value given, which is equivalent 
since -'On(c) is an axiom. 

At this point, the values of P has become 

P = Xv.{v = b 1 true : -i(u = c? true : A(z) 

which simplifies to 

P = Xv.{v = bl true : v = cl On(c) : ^A{z)) 

The value of Q is now given by 

Q = Xy.{y = b 1 false : {Xu.{u = cl ab{b) : A{u)))y) 
which reduces to 



Q = Xy.{y = bl false : y = cl ab{b) : A{y)) 
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The next goal is Q < ab, that is Vz(Q( 0 ) — >■ ab{z)). Fixing z, the goal is Q(z) — >■ 
ab{z). Using the Gentzen rule for introducing — >■ on the right, and writing out 
the current value of Q, our goal is the sequent 

z = bl false : z = c ? a{b) : W{x)^ab{x). 

This is proved by cases, specifically by the cases-left rule. 

Case 1, z = &. The goal reduces to false — >■ ab{z) which is immediate. 

Case 2, z yf 5 and z = c. Then Q(a) reduces to 

z = cAz^bA ab{b) 



so the goal becomes 

z = c, z ^ b, ab{b)^ab{z) . 

The human can note that ab{c) follows from \/x{-iab{x) -A On{x)) and ->On{c), 
and from ab{c) the goal follows quickly. This is a relatively simple problem in 
first-order logic with equality, the difficulties of which are irrelevant to circum- 
scription and second-order logic. Weierstrass is able to prove the goal. 

Case 3, z yf 6 and z yf c. Then Q(z) reduces to lU(z), so the goal becomes 

W (z)=J>a6(z). 

This goal is proved by instantiating the metavariable W : 

fU = Az.(o6(z) V T(z)) 

where T is a new metavariable. The final values of P and Q are thus 
P = Xv.{v = bl true \ v = cl On{c) : {~'ab{v) A -•T{v))) 

Q = Xy.{y =bl false :y = cl a{b) : {ab{y) V T{y))) 

To achieve the stated goal On{b), the prover has only needed to deduce that b 
is not abnormal. Unlike the human, it has not gone ahead to deduce anything 
about other objects than a and b- the uninstantiated metavariable T remains as 
“undetermined” . Of course, the constant b might as well have been a variable; 
the prover can prove \/x{x yf a — >■ On{x)) just as well as it can prove On{b). But 
that proof, like the one above, will still use instantiations of P and Q involving 
free metavariables. 

The proof as produced (and typeset) by our prover can be found in [1]. 
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1 Introduction 

The noMoRe system (first prototype) implements answer set semantics for propo- 
sitional normal logic programs. It uses an alternative implementation paradigm 
to compute answer sets by computing non-standard graph colorings of labeled 
directed graphs associated with logic programs. Therefore noMoRe is an interest- 
ing experimental tool for scientists working with logic programs on a theoretical 
or practical basis. Furthermore, we have included a tool for visualization of those 
graphs corresponding to programs. 

2 General Information 

The noMoRe-system is implemented in the programming language Prolog; 
it has been developed under the ECLiPSe Constraint Logic Programming 
System [1] and it was also successfully tested with SWI-Prolog [11]. The 
source code, test cases and documentation are available at http://www.cs.uni- 
potsdam.de/~linke/nomore. In order to use the system, ECLiPSe- or SWI-Prolog is 
needed [1,11]. Both Prolog systems are freely available for scientific use. Clearly, 
noMoRe works under each platform under which one of the above Prolog systems 
is available. The total number of lines of code is only about 2700, i.e. noMoRe is 
very transparent and it nicely reflects the underlying theory. 



3 Description of the System 

The experimental prototype of the noMoRe system implements nonmonotonic 
reasoning with propositional normal logic programs under answer set seman- 
tics [5]. Originally, answer set semantics was defined for extended logic pro- 
grams^ [5] as a generalization of the stable model semantics [4] of normal logic 
programs. We consider rules r of the form 

qi,...,q„,not si,...,not Sk ( 1 ) 

^ Extended logic programs are logic programs with classical negation. 
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where p, qi (0<i<n) and Sj (0<j<A:) are ground atoms, head{r) = p, body^{r) = 
{gi, . . . ,g„}, body~{r) = {si, . . . , Sfc} and body{r) = body~'~{r) U body~{r). Intu- 
itively, the head p of a rule p gi, . . . , g„, not Si, . . . , not Sk is in some answer 
set A if gi, . . . , g„ are in A and none of si, . . . , Sfc is in A. Look at the following 
normal logic program 

P={a^b,note. b ^ d. c-^b. d^. e-<^d,notf. f ^ a.} (2) 

Let us call the rules of program (2) Va, Vb, Tc, Vd, Ve, and r/, respectively. Then 
P has two different answer sets Ai = {d, b, c, a, /} and A 2 = {d, b, c, e}. It is easy 
to see that the application of r/ blocks the application of wrt Ai, because if 
rf contributes to Ai, then f € A\ and thus Ve cannot be applied. Analogously, 
Vg blocks Ta wrt answer set A 2 . 

3.1 Syntax 

The syntax accepted by noMoRe is Prolog- like (without variables). For example, 
program (2) is represented through the following rules: 

a : - b , not e. b:-d. c:-b. 

d . e : - d, not f . f : - a. 

NoMoRe also accepts ground formulas which are treated as propositional atoms. 

3.2 Block Graphs and A-Colorings 

NoMoRe implements a novel paradigm to compute answer sets by computing 
non-standard graph colorings of the so-called block graph [6] associated with a 
given program P. A set of rules S of the form (1) is grounded iff there exists 
an enumeration {ri)i^i of S such that for all z G / we have that body~^{ri) C 
head{{ri,- ■ ■ , ri_i})^. With this terminology, the block graph of P is defined as 
follows: 

Definition 1. ([6]) Let P be a logic program and let P' C P be maximal 
grounded. ^ The block graph Pp = {Vp, Ap U Ap) of P is a directed graph with 
vertices Vp = P and two different kinds of arcs defined as follows 

Ap = {(r',r) I r',r G P' and head{r') G body^{r)} 

Ap = {(r',r) I r',r G P' and head{r') G body~{r)}. 

Figure 1 shows the block graph of program (2). Observe, that the rules of P are 
the nodes of Pp. Since groundedness (by definition) ignores negative bodies, there 
exists a unique maximal grounded set P' Q P for each program P, that is, Pp 
is well-defined. Definition 1 captures the conditions under which a rule r' blocks 
another rule r (e.g. (r',r) G A^). We also gather all groundedness information 

^ The definition of the head of a rule is generalized to sets of rules in the usual way. 

® A maximal grounded set P' is a grounded set that is maximal wrt set inclusion. 
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in Ip, due to the restriction to rules in the maximal grounded part of P. This is 
essential because a block relation between two rules r' and r becomes effective 
only if r' is groundable through other rules. In all, Tp captures all information 
necessary for computing the answer sets of program P. 

Answer sets then are characterized as special non-standard graph colorings 
of block graphs. We denote 0-predecessors, 0-successors, 1-predecessors and 1- 
successors of Pp by 7(7 (^'), li{v) and 7 j*’(u) for v £ V, respectively. 



Definition 2. ([6]) Let P he a logic program, s.t. \body~^{r)\ < 1 for each r £ P, 
let Pp = {P, ApUAp) he the corresponding block graph and let c: P {©, ©} be 
a mapping. Then c is an a-coloring (application- coloring) of Pp iff the following 
conditions hold for each r £ P 

A1 c(r) = Q iff one of the following conditions holds 

7o"(^) 7^ ® each r' £ jfir) we have c(r') = 0 

b. there is some r” £ jf(r) s.t. c{r'') = ©. 

A2 c(r) = (B iff both of the following conditions hold 

a. 7(7(?') = 0 or it exists grounded 0-path!^ Gr s.t. c{Gr) = ©^ 

b. for each r" £ jf(r) we have c{r") = ©. 

For the generalization of condition \ body~^{r)\ < 1 (for r £ P) see [6]. There you 
can also find further details on a-colorings and the algorithm to compute them. 

Observe, that there are programs like P = {p £- not p} s.t. no a-coloring 
exists for Pp. Intuitively, each node of the block graph (corresponding to some 
rule) is colored with one of two colors, representing application (©) or non- 
application (©) of the corresponding rule. The coloring presented in Figure 1 
corresponds to answer set Ai of P. Node (rule) Ve has to be colored © (not 
applied), because there is some 1-predecessor of Vg colored © (applied). In other 
words, Vf blocks Vg. 




Fig. 1. Block graph of program (2 with a-coloring corresponding to answer set Af). 



A subset of rules Gr C P is a grounded 0-path for r € P if Gr is a 0-path from some 
fact to r in Pp. 

® For a set of rules S C P we write c{S) = © or c{S) = 0 if for each r £ S we have 
c(r) = © or c(r) = ©, respectively. 
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3.3 Architecture 

NoMoRe computes the answer sets of a logic program P in three steps (see Fig- 
ure 2). First, the block graph Ppis computed. Second, Fp is compiled into Prolog 
code in order to obtain an efficient coloring procedure. The compilation borrows 
ideas from techniques utilized in e.g. [10,9] for efficient theorem proving. To read 
logic programs we use a parser and there is a separate part for interpretation of 
a-colorings into answer sets. For information purpose there is yet another part 
for visualizing block graphs using the graph drawing tool DaVinci [7]. 




Fig. 2. The architecture of noMoRe. 



4 Applying the System 

The noMoRe system is used for purposes of research on the underlying paradigm. 
One has to keep in mind, that in the current state noMoRe is just a prototype with 
limited application. However, considering the short amount of time it took to de- 
velop the system and the progress made concurrently with further developments 
it is save to assume that a more useful version will be available shortly. But even 
in this early state, usability for anybody familiar with the logic programming 
paradigm is given. 

5 Evaluating the System 

As a first benchmark, we used two NP-complete problems proposed in [2]: the 
problem of finding a Hamiltonian path in a graph (Ham) and the independent 
set problem (Ind) . In terms of time used for computing answer sets, our first pro- 
log implementation (development time 8 months) is not comparable with state 
of the art C/C-l— I- implementations, e.g. smodels [8] and dlv [3]. Therefore we 
compare the number of used choice points, because it reflects how an algorithm 
deals with the exponential part of a problem. Unfortunately, only smodels gives 
information about its choice points. For this reason, we have concentrated on 
comparing our approach with smodels. Results are given for finding all solutions 
of different instances of Ham and Ind. Table 1 shows results for some Ham- 
encodings of complete graphs where n is the number of nodes®. Surprisingly, 



In a complete graph each node is connected to each other node. 
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Table 1. Number of choice points for HAM-problems. 
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it turns out that noMoRe performs very well on this problem class. That is, with 
growing problem size we need less choice points than smodels. This can also be 
seen in Table 2 which shows the corresponding time measurements. For finding 
all Hamiltonian cycles of a K\q we need less time than the current smodels 
version. To be fair, for Ind-problems of graphs Cir„^we need twice the choice 
points (and much more time) smodels needs, because we have not yet imple- 
mented backward-propagation. However, even with the same number of choice 
points smodels is faster than noMoRe, because noMoRe uses general backtrack- 
ing of prolog, whereas smodels backtracking is highly specialized for computing 
answer sets. The same applies to dlv. Measurements with smodels and dlv 
are made with all optimizations (e.g. lookahead, heuristics to select next choice 
point) activated, whereas noMoRe currently has no such optimizations. 



Table 2. Time measurements in seconds for HAM- and IND-problems on a SUN 
Ultra2 with two 300MHz Sparc processors. 
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Abstract. Conditional Pure Literal Graphs (CPLG) characterize the 
set of models of a propositional formula and are introduced to help un- 
derstand connections among formulas, models and autarkies. They have 
been applied to the SAT problem within the framework of refutation- 
based algorithms. Experimental results and comparisons show that the 
use of CPLGs is a promising direction towards efficient propositional 
SAT solvers based upon model elimination. In addition, they open a 
new perspective on hybrid search/resolution schemes. 



1 Introduction 

Propositional satisfiability is a many-sided problem, that captures theoretical 
and practical interests. We address both of them, by (1) introducing a tool 
called Conditional Pure Literal Graph and (2) showing its promising practical 
application in speeding up some refutation procedures. With respect to the for- 
mer point, CPLGs are introduced to help us gain insights into the connections 
among formulas, models and autarkies. In particular, the concept of autarky [24] 
is investigated and analyzed in detail. As to the latter point, we present an algo- 
rithm based on resolution which heavily exploits CPLGs. It is shown that redun- 
dancy in the search can be greatly reduced using CPLGs within a scheme that 
merges concepts from direct model search and refutation procedures. Refutation 
procedures constitute one of the two main categories of complete algorithms 
for deciding propositional satisfiability (we focus on complete algorithms as op- 
posed to local search techniques). The other category of complete approaches is 
the one of direct model search algorithms [16,19,8,4,27,9], which are based upon 
the Davis-Putnam (DP) procedure [22]. Almost all of the best performing and 
widest used complete algorithms for satisfiability [15,6] fall in the model search 
category. 

Unlike DP-like algorithms, refutation strategies cope with the satisfiability 
problem trying to derive a contradiction from a given formula by means of propo- 
sitional resolution. If a sound and complete procedure derives a contradiction 
from a formula T, then T is guaranteed to be unsatisfiable, and vice-versa. 
Many such strategies have been proposed, and we here focus on model elimina- 
tion (ME) [20,21,26,23]. Model elimination is a subgoal-reduction strategy that 
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works as a backward-reasoning by reducing a goal to a set of subgoals and re- 
cursively working on the new subgoals. It shares redundancy-related problems 
with other subgoal-reduction approaches used in automated deduction, as a pure 
subgoal-reduction mechanism may run into the same sub-problem many times, 
with no memory of the previously found solution. It so happens that reduction 
of redundancy is an important issue in the field of automated deduction [7]. 

A well known way to prevent redundancy in ME is by caching intermediate 
results occuring while doing refutation (succeed or fail in some sub-refutation 
attempt). As far as succeeded sub-refutations are concerned, redundancy is pre- 
vented by lemmaizing [2,14,1,3]. Conversely, failures in sub-refutation attempts 
in the propositional framework have been reduced [13] by using autarkies [24] as a 
caching mechanism [12]. A more general framework that comprises both caching 
mechanisms in the form of meta-level rules for inference has been proposed [7], 
and theoretical analyses on the effectiveness of these methods have been per- 
formed [25,11] . Together with a reduction of redundancy, a failure-related caching 
mechanism can help in extracting a model for satisfiable formulas [10] from a 
ME procedure. The key idea is to extract a model relying on the explanation 
of why a refutation attempt failed. CPLGs get into this framework by allowing 
us (1) to thoroughly understand the concept of autarky and its relation with 
formulas, models and caching devices, (2) to heavily reduce redundancy in the 
search and (3) to extract a model for a formula using a refutation procedure. In 
this sense, our work extends that of A. Van Gelder and F. Okushi [13]. 

GPLGs are graphs that represent properties of models for a given formula 
and extend the concept of autarky to a finer degree of granularity. They are 
built and exploited while the refutation procedure goes on to avoid redundancy 
by memorizing sets of partial assignments which surely belong to a model under 
some conditions (together with explanations of why things go that way). 

From a different and intriguing perspective, we can consider GPLG-based al- 
gorithms as an attempt to overcome the dichotomy between direct model search 
procedures and refutation based approaches. Previous approaches in this direc- 
tion have shown their strength [17], and our algorithm substantially increases the 
degree of coupling between search-related and resolution-related machinery. In 
this sense, the GPLG-algorithm as a whole can be seen either as a model search 
procedure, in which branching on unassigned variables is made exploiting clause 
structure via resolution, or as a resolution-based approach that builds a candi- 
date model as a consequence of failed refutation caching. This kind of bridge 
seems likely to bring better efficiency results than similar previous approaches. 
Moreover, such results pay no duty to clearness. 



The outline of the paper is as follows: Section 2 gives preliminaries on both 
propositional formulas and graphs and introduces notations. Section 3 formally 
introduces GPLGs and their properties and presents a working example. The 
practical use of GPLG within a ME-procedure is discussed in Section 4. Experi- 
mental results are reported in Section 5 to show how GPLGs are more effective 
than previous approaches in reducing redundancy. Finally, in Section 6 we draw 




Conditional Pure Literal Graphs 333 



our conclusions and present directions for future work. Further discussions about 
CPLGs and formal proofs of all the results presented below can be found in [5] . 

2 Preliminaries 

We consider formulas in conjunctive normal form (CNF). To our extent, a propo- 
sitional formula is a set of clauses, every clause being a non-trivial set of literals 
(a set of literals is trivial if it contains a literal and its negation). Double nega- 
tion is absent. Formulas are indicated by calligraphic letters . . ), clauses 

(and set of literals) by uppercase greek letters . . ), literals by lowercase 

greek letters (v3,7v • • ) and variables by lowercase roman letters (a,b,. . . ). 

An assignment for T is & non-trivial set of literals A containing only variables 
in it is total if every variable in T appears in A, partial otherwise. The 
variable in a literal 5 is referred to as var(S), and the set of variables appearing 
in a set of literals A is denoted as V AR{A) = {uar((5)|(5 G A}. The notation 
V AR{T) is similarly used to refer to the set (J^ VAR(Ai) of variables appearing 
in iF = {Ai, A 2 , ..., An}. The set of every possible positive and negative literal 
on the variables VAR{iF) is written as LIT {if). The complement set of A is 

defined as Z\ = {-ij | 6 G A}. The set PAS{T) of partial assignments on the 
variables of T is defined as PAS{T) {A G \ Z\ n Z\ = 0}. 

Definition 1 (star). Given a formula T and an assignment A G PAS{T), we 
define P * A ^ (iF A) P A, where iF Z\ {P G iF | F fl Z\ = 0} and 
PP A=^ {P\A \ P e P}. 

P*Ais the formula resulting from P after the assignment A is made. The 
operator represents unit subsumption with all the literals in A considered as unit 
clauses, while represents unit resolution with the same unit clauses. When 
A = {i5} we write P *6 instead of P * {<5}. When the star operator is applied to 
two sets of literals, its meaning is assumed to be the same as the union operator, 
hence {P * A) * j = P * {A * j) = P * A * We denote with P\ the set of 
clauses in P which contain at least one of the literals in A. The formula Pj a is 
called projection of P onto A. We write P\ s instead of P\ when A = {<5}. 

Two literal symbols T and T are introduced, in so as -■T = T, iF * T = P 

for every P and P\ = P for every P (T and T are the unit elements for 
the projection and star). By reversing the order in which star and projection 
are applied, we get the same result, provided arguments are consistent, i.e.: 
{P * A)\r = {P\ r) * A for any A, P G PAS{P) such that AfiP = 0. 

Any formula P' = P * A for some A is called sub-formula of P, and a 

refutation of some sub-formula of P is called sub-refutation of P . 

We work on direct graphs and indicate them by uppercase roman letters 
and their nodes by lowercase roman letters. We call source node a node with 
no incoming arcs (wrt a graph W). Similarly, a subgraph G' of a graph G is a 
source- subgraph if no direct arc from G — G' to G' exists. The set parg.(w) of 
nodes in G from which a direct arc to w G G exists is called the parent set of w 
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in G. Given two nodes w, w' G W , w' is said to be in the scope of w if a path 
from w to w' exists. We call transitive-removal of the node w in W the operation 
of removing from W the node w together with every node w' in its scope (and 
together with every arc that looses its root or destination node). 

2.1 Autarkies 

Definition 2 (autarky). A (partial) truth assignment A G PAS{T) that sat- 
isfies a subset S C iF of a propositional formula T , and contains no variable in 
T — S, is said to be an autarky for T . 

This definition ensures that the set of models oi T — S is not shrunk by an 
autarky A, every clause in iF — 5 being untouched. 

Lemma 1. Z\ G PAS{T) is an autarky for P iff P\^ ^ P\a- 

The above lemma concisely characterizes an autarky as a (partial) assignment in 
which clauses involved in resolution are a subset of those involved in subsump- 
tion. As an example, consider the following formula. 

P = {{o, - 16 }, {-lo, -•&, /}, {-lo, c, d}, {e, a, -ic}, {-^e, d, -•/}, {-id, -•/}} 
According to Lemma 1, it is ^ P\ a for the autarky A = {a, ~^b, c}: 
{{-•a, -■&,/}, {-'O, c, d}, {e, a, -ic}} C {{a, -■ 6 }, {-lO, -■ 6 , /}, {-lO, c, d}, {e, a, -ic}} 

A model is a special case of autarky: if the subset 5 C of the formula P 
satisfied by the autarky is equal to the formula itself, the autarky is a model. 
This is guaranteed to happen when the autarky is a total assigment. However, 
it is not necessary for an assignment (and hence for an autarky) to be total in 
order to be a model. We say that a literal in & model A of is essential if 
A — {ip} is not a model for P . 

If we find an autarky A for a given formula P , we are not guaranteed that P 
is satisfiable, unless A is a model itself. However, we have already pointed out 
that the set of models of the sub-formula P — P\ a is not shrunk by A. These 
results can be summarized as follows. 

Theorem 1. Given any autarky A on P , it is P A — > P * A. 

3 Understanding CPLG 

We now introduce the key concept of Conditional Pure Literal Graph, starting 
from the well known definition of pure literal. 

Definition 3 (PL). A literal p is said to be pure in a formula P when it occurs 
in P and its negation ~^p does not. 

Definition 4 (CPL). A literal p is said to be a conditional pure literal ( CPL ) 
in P with respect to a partial assignment A if it is a pure literal in P * A, i.e. 
{P * A) \ = 0. The assignment A is called the condition or the premise for p 

to be pure. 
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We use the shorthand (or simply Ap> p when no confusion arises) in 

place of ^ * Z\| = 0 to denote a literal p that is pure under the condition A 

with respect to the formula T . If a conditional pure literal has an empty premise, 
then it is a pure literal. 

Definition 5 (CPLG). A Conditional Pure Literal Graph W on a proposi- 
tional formula T is a direct graph with the following properties: 

— every node is labeled with a literal in LIT{T), and every variable in V AR{T) 
is represented at most once; 

— the nodes are partitioned into hypothesis-nodes and CPL-nodes, every hyp- 
othesis-node being a source for the graph; 

— every CPL-node w € W is labeled by a literal p such that A-^j^j^p, where 
Ayj is the set of literals labeling nodes in par^^^w). 

The set of literals labeling CPL-nodes in a CPLG W is denoted by CPL{W), while 
HYP{W) is the set of literals labeling hypothesis-nodes. We pose NODES (W) = 
CPL(W) U HYP{W). Notice that every hypothesis-node has to be a source for the 
graph according to Definition 5, but CPL-nodes may be sources as well. 

Definition 6 (proper and minimal CPLs). A CPL A> p is said to be: 

1. proper iff A' > p holds for no premise A' C A; 

2. of minimal size iff A'> p holds for no condition A' with |Z\'| < |Z\|. 

Definition 7 (proper, self-contained and complete CPLG). A CPLG W 
on a formula T is: 

1. proper iff every CPL-node in W is proper; 

2. self-contained iff HYP{W) = 0; 

3. complete iffVAR(T) = V AR{N0DES{W)) . 

As CPLs are bricks to build a CPLG-wall, it is straight to introduce two 
operators to explain how walls are built and dismantled. 

Definition 8 (extension step). Let be the set of CPLGs on the set of 
variables P and C = PAS{P) x LLT{P) the set of CPLs on P. We define a 
partial binary function 

ext : X C — ^ 

such that W' = ext{W, A^ p) is defined only when var{p) ^ V AR{CPL{W)) 
and NODES(W) n A = 0, and is obtained from W as follows: 

— a new hypotesis-node is added to W for every literal in A — NODES(W); 

— if p € HYP(W), the node labeled by p is turned into a CPL-node; if p ^ 
HYP(W) a CPL-node labeled by p is added to W; 

— for every literal 6i € A an arc from the node labeled by Si to the node labeled 
by p is added to W . 
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Fig. 1. Graphical representation and composition of CPLs 
G" is said to extend G by Z\> (p. 

The reflexive and transitive closure of the just defined relation can be considered. 
This way, a CPLG W is said to extend a CPLG W if W can be obtained from 
W by zero or more extension steps. 

As opposite to the extension operation, a prune operation is defined to cap- 
ture the way a GPLG changes by removing some hypothesis-node. The intuition 
is that when a hypothesis-node is removed, all the GPL-nodes that (directly or 
not) rely on the removed hypothesis must be removed as well. 

Definition 9 (prune step). Let he the set of CPLGs on the set of variables 
r. We define a partial binary function prune : x LIT{r) — ^ such that 

W' = prune{W, p) is defined only when ip G HYP(W) and is obtained by a 
transitive-removal of p from W. 

A simple graphical representation for GPLs and GPLGs is adopted. Figure 1(1) 
illustrates how a GPL {<52, ^ 3 , <) 4 }> <)i is represented. The conditions under which 
is pure are drawn in dotted circles, while the GPL itself is represented within 
a continuous border. This kind of representation naturally evolves into a graph 
structure when GPLG are considered instead of single GPL. Figure 1(11) repre- 
sents a sample GPLG. Figure l(III) illustrates an extension step: the GPLG (II) 
is extended by the GPL (I) to obtain the GPLG (III). 



3.1 An Example 

Let us consider the following formula: 

E = { {a, 6}, {a, -•b, d}, {-la, -•/}, {-la, -•b, -id}, {-la, d, e}, 

{~'b, -'c}, {c, e}, {c, -le}, {~>c, -•/}, {~>c, ->e}, {d, ~>f} } 

and the GPLG Wc on E represented in Figure 2(1). A number of observations 
can be made even in this simple case. 

— The graph contains four nodes (equivalently, four variables and four literals) . 
Three of them are GPL-nodes, and one is a hypothesis-node. 

— The hypothesis-node d is present in We as a premise of a. It is a source node. 

— The GPL-node -■/ has no premise, so it actually is a pure literal. This can 
be quickly verified by looking at E. 

— The node ~^b is a pure literal conditioned to = {a}. This means that all 
the clauses in E that contain b are satisfied by the assignment 
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Fig. 2. Two CPLGs on £ 

— The node a is a pure literal conditioned to Aa = {-•/, d, -■6}. This means 
that all the clauses in S that contain -lo are satisfied by the assignment Aa- 

— Every CPLG may be cyclic, and Ws is cyclic. There is no contradiction 
between the definitions of CPLG and the fact that a CPL tp' is in the premise 
of a CPL tp" while - at the same time - p" belongs to the premise of p' . The 
nodes a and -<b are involved in one such “one-length” cycle in Figure 2(1). 

— We is a proper, not self-contained, not complete CPLG. 

The CPLG We in Figure 2(1) can be extended to the complete and self-contained 
graph Wg in Figure 2(11). It is easy to check that W^ is still a CPLG on £. 



3.2 Properties 

The condition A^ under which a literal p not pure in T becomes pure T A 
has an obvious meaning in terms of models for subsets of T . 

Lemma 2. If p is a CPL in T conditioned to A^p, A^ is a model for T\ 

Lemma 2 can be exploited to obtain the following results about CPLGs. 

Lemma 3. For every CPLG W on a given formula Tit holds that: 

1. HYPfW) is a model for {T * CPL(W)) \ 

2. CPLfW) is a model for {T * HYP(W)) \ 

3. {T*mDES{W))\^p^-^ = th. 

As an example. Lemma 3 applied to the CPLG in Figure 2(1) says that (1) d is a 
model for {T * {a,^b,^f}) \ (2) {a, -■&, -•/} is a model for (d^* d)| {^o.&./} 

and (3) {T * {a,~>b, ~>f , d}) \ [^a,b,f} = 0- 

In the previous section, we recalled what an autarky is. Now we can explain 
autarkies in terms of CPLGs, according to the following lemma. 

Lemma 4. For every CPLG W on a given formula Tit holds that: 

1. CPL{W) is an autarky for T * HYP{W); 

2. NODESfW) is an autarky for T ifW is self-contained; 

3. NODESfW) is a model for T ifW is self-contained and complete. 
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Notice that NODES (W) may be an autarky even though W contains hypothesis- 
nodes and that every (even non self-contained) CPLG on T may contain many 
self-contained sub-graphs that are CPLGs on T . So, many autarkies can be 
encoded into a single GPLG. For example, the two cuts c\ and C 2 represented in 
Figure 2(11) leave to their left two source and self-contained subgraphs Wi and 
W 2 - Both NODES{Wi) = {-■/} and N0DES(W2) = {-'/, a, -■6, d} are autarkies. 

An essential result for integrating GPLG into SAT algorithms is given in the 
following theorem. 

Theorem 2. For any CPLG W on T it holds that T * HYP{W) i — T * 
NODES{W) equivalently, if S = T * HYP{W), then Si — >S* CPL{W) 

Notice that Theorem 2 does not guarantee that T*HY P{G) and T*NODES{G) 
are logically equivalent. Nevertheless, it ensures that T * HYP{G) and T * 
NODES{G) are equivalent as to satisfiability. So, the idea for integration within 
SAT algorithms is that we can consider T * NODES(W) instead of E * HYP(W) 
as long as we are interested in deciding satisfiability for T . 

4 CPLG within Model Elimination 

We can construct complete and self-contained GPLGs on T by inspecting the 
structure of T , provided we know a model for such formula. Gonversely, a SAT 
algorithm does not know any model in advance, its aim being to discover whether 
such a model exists. So, which is the utility of GPLGs in a SAT algorithm? 

Reversing the perspective, we will use a GPLG as a tool to help building 
a model for T which may be constructed and used before the entire model (or 
even just an autarky) is known. The concept of GPLG advantageously meets the 
SAT problem within the framework of propositional model elimination, where it 
plays the role of a caching device to store information about failed sub-refutation 
attempts, for later re-use. Figure 3 shows a doubly-recursive algorithm written in 
a G-like pseudo code which attempts to refute a formula by propositional model 
elimination. It is the starting point for our work. The global variable T repre- 
sents the formula to refute. The algorithm is activated by “refuteGoal(0,T)”. 
The first argument of both procedures is a set of literals called ancestor literals 
(or simply ancestors). They determine the current sub-refutation to be the refu- 
tation of E * A. The second argument of refuteGoal is a literal to be refuted 
called sub-goal, while the second argument of refuteClause is a clause to be 
refuted. Figure 4 shows a version of refuteGoal modified to introduce the use 
of a GPLG. The global variable W represents the (initially empty) GPLG on 
which the algorithm is working. The procedure refuteClause is unchanged. 

Extension, prune and use are three different operations performed by the 
procedure in Figure 4 on the GPLG W. Let us consider them in turn. 

ME attempts to show that no model exists, by systematically proving the 
refutability of sub-formulas E*A for some suitably generated partial assignments 
A. Many sub-refutations may fail, even though the formula E is unsatisfiable (i.e. 
its top-level refutation succeeds). Figure 4 shows that failing a sub-refutation is 
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boolean refuteGoal (zl , 

A' AU {cp}; 

£ ^A’ ; 

refutationSucceed ■<— false; 

while (not refutationSucceed and |£| > 0 ) 

{ r -ir- getAClause(£) ; 

refutationSucceed ■<— refuteClause i^) ; 

£ -(^ £ - r-, y 



return refutationSucceed; 

} 



booleein refuteClause(zl, <P) 

i 

refutationSucceed ■<— true; 

while (refutationSucceed and |^| > 0) 

{(/?■<— getALiteral(^) ; 

refutationSucceed ■<— refuteGoal((/?,zl) ; 

^ ^ — (p; y 

return refutationSucceed; 

> 



Fig. 3. Structure of the basic model elimination algorithm 





boolecin refuteGoaKzA, p) 

■c 

A' Au {p}; 

£' ^ T 1^^ 

£ ^ £' ^ CPL{W); 
refutationSucceed ■<— false; 


use — >■ 


while (not refutationSucceed and |5| > 0 ) 

■i r getAClause (5) ; 

refutationSucceed ■<— refuteClause(Zl^ , P) ; 

£ -(^£U cpL{w)-, y 


U.OC 7 


prune — > 


if (refutationSucceed) 

W <— prune{W, p ) ; 


extension — > 


{ ^ a sub-set of U CPL{W) which is a model for T \ 

W <r- ext{W,^y^p)\ > 




return refutationSucceed; 

} 



Fig. 4. Structure of the refuteGoal procedure modified to introduce a CPLG 

the precondition for succeeding in adding a CPL to the CPLG (the extension 
step). When a sub-refutation succeeds, a prune step takes place. The refuted 
sub-goal is for sure a hypothesis-node in the CPLG. The nodes of the graph in 
the scope of ip must be removed, to guarantee that only sound (wrt the use step) 
information is held in W . In other words, a transitive removal on p is performed. 

As a consequence of the way the extension and prune steps are performed, the 
CPLG W maintains two interesting properties (see Section 4.3). (1) The literals 
in CPL(W) cannot be refuted given HYP{W). (2) HYP{W) is always contained 
in the ancestor set. Therefore, any literal in CPL (IT) is not refutable given the 
current set of ancestors: the use step exploits the CPLG as a caching device by 
avoiding - in any refutation attempt - the use of every clause containing literals 
known to be not refutable (it uses if * A' * CPL{W) instead of iF * A'). 

4.1 Computing the Premise of a CPL 

A sub-set of A U CPL (IT) satisfying iF\ is chosen in Figure 4 to be the premise 
of the CPL on p. After the relevant subset A" C A U CPL(W) of literals on 
variables involved in T\ is isolated, the graph can be extended by A" > p. 
However, this CPL is guaranteed to be neither minimal nor proper, whereas we 
would like to have such CPLs in our CPLG because of the way prune steps are 
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performed. Minimal and proper CPLs are less prone to be removed than general 
CPLs are, because their survival relies on less hypotheses. 

In general, given any CPL on cp with premise F, there exists at least one 
proper CPL on p (and a CPL of minimal size) with premise F' C F. So, there is 
a tradeoff between the time spent looking for better CPLs (premises of little size) 
and the time spent rebuilding a CPLG pruned just because proper CPLs were 
not used. A greedy strategy can be used to incrementally build a premise for p to 
be pure in T\ starting from the empty set Iq = 0. It is sufficient to consider 
all the clauses Ai G T\ for z = 1, ..., n, in any order and put L) = F^_i if is 
satisfied by Fi and Fi = ri_i U {i5} for some 6 € A" D Ai otherwise. A heuristic 
strategy can be introduced, using a search algorithm that uses A" as a trusted 
guide towards a model, and makes a heuristic choice on literals aimed at finding a 
small subset of A" that is still a model for T\ Finally, an exhaustive strategy 
can resort to backtracking. As soon as a satisfying subset of A" is found, the 
algorithm chronologically backtracks along the history of its choices and restarts 
the search towards an even smaller model. 

Even if the last strategy is the only one that guarantees a proper CPL as 
a result, an empirical evaluation of the considered tradeoff led us to prefer the 
greedy strategy for the experiments presented in Section 5. 

4.2 An Example of Building and Exploiting CPLG 

We now describe how the modified refutation procedure may construct and use 
the CPLG in Figure 5(1) on the formula S of Section 3.1, through the intermedi- 
ate steps in Figure 5(11). Figure 5(1) shows a portion of the generated refutation 
tree. Closed branches are marked by an hourglass-like symbol. Sub-goals have a 
left and a right numeric label. Numbers on the left side give the order in which 
nodes are expanded, while the right-ordering is the one in which they are exited 
(i.e.: it is the order in which CPLs are generated and represented in 5(H)). 

The first CPL to be identified is the one on -■/. When the recursion stops at 
the expansion step on the literal ->f with ancestors A-,/ = {c}, T *A^f does not 
contain clauses in which / appears (otherwise, a further reduction step would 
have been taken). Now it suffices to extract a condition A^f for the CPL on -i/, 
i.e. a model for F\ /, which surely exists and is a subset of A-,/. In this particular 
case, the empty set is a well suited assignment, as / does not appear at all in 
F, so that -i/ is a pure literal and 0> -■/ is added to W (see Figure 5(11). 1). 

The second CPL to be extracted is the one on d. This time, the subset Ad of 
the set of ancestors Ad = {c, -i6, a} that is a model for F\ results to be {-■6}. 
Hence, d is added to W (see Figure 5(11). 2). 

The third CPL found from the procedure is a with Aa = {d,-'f,-'b}. 
This is an interesting case, because the condition Aa consists of an ancestor of 
a, a descendant of a and a CPL already in W (Figure 5(11). 3). 

When the fourth CPL ({a}^ ~'b) is discovered (Figure 5(11). 4) and added 
to W, the whole CPLG W becomes self-contained. At this point, CPL (IF) = 
NODES (W) = {o, -'6, d, -i/} is known to be an autarky for T. The procedure 
goes on leaving the leaf-node ~^e and adding the relative CPL (step 5 in Figure 
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Fig. 5. Failed refutation tree and CPLG construction for E 



5(H)) and finally introducing the CPL {-i/, -■&, ~'e}> c in the sixth step to obtain 
the complete graph in Figure 2(11). 

Even in the short refutation trace just seen, the procedure takes a ready 
advantage from the presence of a CPLG. Not only ends the procedure up with a 
model codified in the CPLG, but it also exploits the unfinished graph structure 
to avoid doing some useless work. We now show how by means of some examples. 

The tree in Figure 5 is meant to represent only the failure-related portion of 
the global tree generated by the ME procedure. Nevertheless, according to Figure 
3, it seems to be incomplete. Below the sub-goal a the clause {-•a, -•/} seems 
missing. The leaf-node -le should not be a leaf-node because {-■o, d, e} G T * c. 
We also expect that other clauses should be proven to be non-refutable below 
the root. These lacks are not an oversight. Some refutations did not take place 
as their outcomes are known to be failures, thank to the presence of the CPLG. 
The reduction of a by {“•a,-'/} is not attempted because the CPLG number 
3 proves that -■/ is not refutable. In the same way, {“•a, d, e} is not chosen to 
reduce -le because the CPLG (number 5) ensures that the current set of ancestors 
{c} suffices to make ->e non-refutable. Finally, when {c, e} is proven to be non- 
refutable, the CPLG (Figure 2(H)) by then encodes a complete model for T 
which makes it meaningless to proceed in further refutation attempts. All these 
refutations are skipped because the sub-formula T * A' * CPL (IF) considered in 
the modified algorithm does not contain the “redundant” clauses, due to the 
CPL{W) factor, whereas T * A' alone would have still contained them. 

In this simple example, prune steps are never performed. 
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4.3 Properties of the Algorithm 

Logically speaking, we have just considered a safe use of the CPLG. However, 
the algorithm shown in Figure 4 makes also an implicit and possibly unsafe 
use of the information in the CPLG. Let us consider just the basic case of a 
single GPL A^y^S. As a semantic-level consequence of this CPL we can write 
T * A\f -i5, but it is not guaranteed to he T * A ^ <5 (the simple case when 
A = % can be used to help us understand why: models with pure literals assigned 
to false may exist). However, the algorithm in Figure 4 assumes so, and behaves 
consequently: when a literal Lp belongs to CPL {W), every sub-goal -u/? is implicitly 
refuted, according to the meaning of the star operator va T * A' * CPL{W). 

This lazy assumption is strictly related to the observation we made after 
Theorem 2 and is also the reason why we need to explicitly prove the correctness 
of the algorithm. Before doing this, we need to make precise an underlying 
assumption about the graph W the algorithm deals with. 

Lemma 5. At every time during the execution of the algorithm in Figure 4, the 
variable W represents a graph that is a CPLG on the formula T considered. In 
addition to this, if we consider the graph W and the set of ancestors A at any 
time during the search procedure, it results to be HYP(W) C A. 



Theorem 3. The algorithm in Figure 4 is sound and complete. 

The CPLG goes through many changes as extension and prune steps alternate. 
However, not everything is made to be broken. For example, when an autarky 
is found, it is never lost, no matter how many prune steps occur. The reason is 
that only pieces of the graph that are in the scope of some hypothesis-node can 
be removed, while autarkies are encoded by self-contained source sub-graphs. 
So, prune steps are not always destructive. In favorable cases they could even 
remove hypotheses upon which no CPL relies. An ultimate property of CPLGs 
comes out of this game of cut and extension steps: the procedure eventually finds 
a model if the formula is satisfiable. 

Theorem 4. If W is the CPLG remaining after the algorithm in Figure 4 ter- 
minates its execution on a satisfiable formula T , NODESfW) is a model for T . 

As a final remark, we point out that CPLGs are built as separate data structures 
(not totally coupled with the search procedure) in so as autarkies (and even 
models) can be recognized while the search procedure is going on at any depth. 

5 Experimental Results 

Even though the algorithm in Figure 3 can suitably be combined with CPLGs, 
it is known to suffer of a high degree of redundancy, as Table 1(1) shows. The 
average number of subgoals expanded on average on very small uniform random 
formulas is reported. The number of subgoals expanded is chosen to measure the 
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efficiency of the procedure as it is independent of the machine speed and it allows 
for easy comparisons between subgoal-reduction based algorithms. The CV ratio 
is the ratio between the number of clauses and variables in the formula. The 
middle value (4.27) is chosen according to an empirical complexity result [18,8] 
indicating this ratio as the hardest one for uniform random 3CNF formulas. The 
other two values are arbitrarily chosen to represent under-constrained formulas 
(likely-to-be-satisfiable) and hyper-constrained ones (likely-to-be-unsatisfiable). 



Table 1. Subgoals expanded by (I) the basic and (II) the modified ME procedure 



Number of variables Number of variables 



CV 


6 


8 


10 


12 


3.27 


2,251 


12,991 


57,045 


272,982 


4.27 


5,059 


43,019 


298,011 


1,912,228 


5.27 


10,082 


93,741 


921,880 


>5,000,000 



CV 


6 


8 


10 


12 


3.27 


45 


71 


120 


261 


4.27 


27 


43 


90 


141 


5.27 


10 


18 


26 


33 



(I) (II) 



The use of CPLG helps avoiding such redundancy. Table 1(11) shows the 
effect of introducing CPLGs. We here concentrate on the relative advantage 
the algorithm achieves with respect to failed refutation caching. The promising 
results achieved are indicative of impressive potential performance. In this per- 
spective, a significant measure can be obtained comparing the GPLG approach 
with MODOG (proposed by Van Gelder and Okushi [12,13]), which is another 
SAT solver based upon the procedure in Figure 3. It uses boosting mechanisms 
to perform both failed-subrefutation and succeeded-subrefutation caching. To 
directly compare their approach with our, we extracted from MODOG the fail- 
ure caching machinery, and re-implemented it upon the usual basic algorithm as 
unique boosting mechanism. We call mini-MODOG the resulting procedure. 

Table 2(1) shows the result of comparisons between mini-MODOG and our 
algorithm on (I) uniform, (II) chain and (III) tree random formulas. Each cell 
in the table contains a “x 4- y — >■ p%” value, where x and y are the number of 
subgoals expanded on average by mini-MODOG and our algorithm respectively, 
and p = 100 * (1 — y/x) is the percentage of expansions saved using GPLGs. 

Structured random generators are used to capture many properties of the 
instances coming from realistic domains, retaining the typical advantages of 
parametric and controllable random generators. We here focus on chain and 
tree generators. The key idea behind both of them is to generate several inde- 
pendent sub-instances (uniform random formulas), and then to obtain a single 
global instance by connecting these sub-instances. The chain generator obtains 
connections by introducing a supplementary amount of clauses containing vari- 
ables from two sub-formulas (n independent sub- formulas iFi,i = are 

connected in a linear chain by means of n — 1 additional clauses containing only 
two literals, the first on a variable in V AR{iFi) and the second on a variable 
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Table 2. Confrontation on different random domains 



(I) Uniform random formulas 



cv 


30 variables 


40 variables 


50 variables 


3.27 
127 

5.27 


2,081^1,774 -^15% 
46,593^42,119-^70% 
48,603^46, 64 7->^% 


12,915^9,437 -^27% 
556,440^485,151->73% 
696,375^654,848^6% 


127,321^80,962 -^36% 

lll,111^9,983,028->75% 
8,457,481^8,015,975^5% 



(II) Chain random formulas 



#F 


10 variables 


20 variables 


30 variables 


10 

20 

30 


479^389 -^19% 

898^799 -^11% 

1,227^1,030 -^16% 


1,790^1,048 -^ 41 % 
42,056^5,876 ^ 86 % 
40,607^11,785 ->77% 


165,600^6,549 ->06% 

343,510^12,019 ->07% 

815,020^16,300 ->06% 



(III) Tree random formulas 



#F 


(3,6)-trees 


(4,8)-trees 


(5,10)-trees 


40 

60 

80 


3,604^2,332 ->55% 
4,848^7,877 ->55% 
19,936^8,267 ->50% 


120,764^63,626 ->^7% 
290,685^60,663 ->56% 
453,676^178,471->67% 


433,398^115,826 ->75% 
3,796,063^1,848,278->57% 
9,510,065^3,804,026->66% 



in V AR{TiJ^i) for i = — 1). The tree generator connects sub-formulas 

in a random tree structure. For every couple of sub-formulas that are neighbor 
in the tree, connection is guaranteed by imposing that the two formulas in the 
couple share a fixed percentage of their variables. Once fixed the number of lit- 
erals in each clause (here always 3), these random generators need, as additional 
parameters, the number of sub-instances to be connected and the number of 
variables and clauses in each sub-instance. A (k,m)-tree is a formula where each 
sub-instance has k + m variables, k of which shared with its neighbors. In our 
experiments with the chain generator, we consider chains of 10, 20 and 30 sub- 
formulas, each one containing 10, 20 and 30 variables and twice as many clauses 
as variables. Table 2(11) reports experimental results on chains, and shows an 
impressive advantage of the CPLG based algorithm. As to the tree generator, 
we considered trees with 40, 60 and 80 nodes, each one containing 9, 12 and 15 
variables and twice as many clauses as variables. Results shown in Table 2(III) 
confirm the great advantage exhibited when dealing with structured domains. 



6 Conclusions and Future Work 

We presented a simple and powerful tool - called CPLG - to help understand 
connections among propositional formulas, models, autarkies and caching de- 
vices. Properties of this tool have been investigated and several examples have 
been presented to illustrate how it works. CPLGs proved to be useful when 
exploited within ME based algorithms. We showed in details how the bridge 
between CPLGs and ME is built, and presented a resulting algorithm for the 
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satisfiability problem. Soundness, completeness and model extraction capability 
have been investigated for this algorithm. 

Even if we heavily restricted ourselves in the use of complementary boosting 
mechanisms, results are quite promising. The use of CPLG in isolation from 
other devices, allowed us to highlight their contribution in comparison with 
similar techniques. In particular, we tested our approach against a recent failure 
caching mechanism built upon the same basic ME algorithm (namely, the one 
used in MODOC). Experimental results show our greater efficiency on both 
uniform and structured random instances. Encouraged by these results, we are 
extending our technique to an algorithm based upon a more general pruning 
device that generalizes CPLGs. Conditional Pure Literal Graphs are the first 
step towards this promising kind of integration of direct model search and 
refutation. 
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Abstract. This paper is devoted to the experimental evaluation of several 
state-of-the-art search heuristics and optimization techniques in proposi- 
tional satisfiability (SAT). The test set consists of random 3CNF formulas 
as well as real world instances from planning, scheduling, circuit analysis, 
bounded model checking, and security protocols. All the heuristics and 
techniques have been implemented in a new library for SAT, called SIM. 
The comparison is fair because in SIM the selected heuristics and tech- 
niques are realized on a common platform. The comparison is significative 
because SIM as a solver performs very well when compared to other state- 
of-the-art solvers. 



1 Introduction 

The problem of propositional satisfiability (SAT) is fundamental in many areas 
of computer science such as formal verification, planning and theorem proving. 
Despite the exponential worst-case complexity of all known algorithms for solving 
SAT [1], recent implementations of the Davis-Logemann-Loveland (DLL) algo- 
rithm are able to solve problems having thousands of propositions in a few sec- 
onds. Much of the success of these solvers is due to clever search heuristics and 
various optimization techniques that are implemented on top of the basic DLL 
algorithm. In general, it is difficult to assess the effectiveness of such heuristics 
and techniques because (i) the quality of the implementations varies greatly and, 
(ii) each solver is hard-coded with specific search strategies. 

In this paper we present an experimental comparison of several state-of-the- 
art search heuristics and optimization techniques. The test set consists of random 
3CNF formulas as well as real world instances from planning, scheduling, circuit 
analysis, bounded model checking, and security protocols. All the heuristics and 
techniques have been implemented in a new library for SAT, called SiM. The 
comparison is fair because in SiM the selected heuristics and techniques are realized 
on a common platform. Thus, our experimental evaluation is not biased by the 
differences due to the quality of the implementation, and provides a clear picture 
about the relative effectiveness of different search strategies. The comparison is 
significative because SiM itself performs very well when compared to other state- 
of-the-art solvers. 

The paper is structured as follows. In Section 2 we characterize the test set 
that we have been using. We also experiment with various state-of-the-art SAT 
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solvers on our test set with the purpose of showing that the chosen problems are 
indeed significative, and of getting an indication of the relative strength of the 
various SAT solvers. Section 3 is devoted to “basic SiM”, i.e., to the presentation 
of SIM corresponding to the basic DLL algorithm. In Sections 4 and 5, we present 
the procedures implementing respectively backjumping and learning. The archi- 
tecture, the branching heuristics, the backjumping and learning procedures are 
described in as many details as possible (given the space requirements). These 
detailed presentations are necessary in order to fully understand the procedures, 
but also the computational cost associated to each of them. As another added 
value of the paper, we describe all the procedures (except for the ones correspond- 
ing to the branching heuristics) by presenting the corresponding pseudo-code. 
We believe that our presentation is detailed enough to provide a good starting 
point for implementation. The presentation of the procedures in each Section is 
complemented with the experimental analysis, showing the relative efficiency of 
the different heuristics if implemented in basic SiM (Section 3), or combined with 
backjumping (Section 4) and learning (Section 5). In the final remarks (Section 6) 
we summarize the results obtained, and point out that SiM, used as a stand-alone 
solver, behaves very well. 

Formal Preliminaries: We use the term atom as a shorthand for propositional letter. A 
literal is an atom or its negation. If Hs a literal, (i) |Z| is the atom in I, and (ii) I is ->1 
if I is an atom, and is \l\ otherwise. A clause C is an n-ary disjnnction of literals such 
that, for each pair of literals l,V in C it is not the case that |Z1 = |Z^| . A clause is Horn 
if it contains at most one atom not preceded by the negation symbol. A formula is an 
m-ary conjunction of clauses. As customary, we think of clauses as sets of literals, and 
formulas as sets of clauses. An assignment is a function mapping each atom into {t,f}. 
An assignment can be extended to formulas according to the standard truth tables of 
propositional logic. An assignment /r satisfies a formula ip if /r(y>) = T. A formula p is 
satisBable if there exists an assignment which satisfies ip. The problem we are interested 
in is: “Given a formula, is it satisfiable?” . 



2 Experimenting with DLL Solvers 

2.1 Designing the Test Set 

Traditionally, SAT solvers are compared on instances corresponding to real world 
problems and/or randomly generated samples. It is well known that problems be- 
longing to the two categories have different characteristics. Roughly speaking, real 
world instances present some “structure”, corresponding somehow to the “struc- 
ture” of the original problem coded in SAT. For example, in the case of a a 
SAT-instance corresponding to the verification problem of a circuit C, typically 
the SAT-instance consists of different parts, each part corresponding to a sub- 
circuit of C. Randomly generated tests became popular after [2], in which it is 
showed that, using the Fixed Clause Length model (FCL) [3], it is possible to 
generate very hard instances. Since then, they have been widely used to test SAT 
solvers’ performances. However, they lack the structure of real world instances. 
Given their different characteristics, we decided to use both a test set consist- 
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ing of instances corresponding to real world problems and a test set of randomly 
generated samples. 

Our test set of real world problems consists of 200 instances, 97 satisfiable 
and 103 unsatisfiable, yielding an approximate 50% chance of a satisfiable (resp. 
unsatisfiable) instance. In selecting these instances, we wanted to have benchmarks 
(z) which are well known in the literature, and (zz) -if possible- that have already 
been used for comparing SAT-solvers. In particular, we chose: 

— the “famous” 30 parity problems, see, e.g., [4]; 

— the 16 instances of the Bejing Competition held in 1996: 7 circuit equivalences, 
6 scheduling problems, 3 planning problems; 

— the 32 Data Encryption Standard (DES) problems, see [5,6]; 

— 34 formal equivalence verification problems, see [7]; 

— 31 instances of bounded model checking (BMC), see [8,6]; 

— 7 instances of formal verification properties of pipelined circuits from Velev’s 
web page http://www.ece. cmu.edu/~mvelev; 

— 11 quasigroup problems, see [9]; 

— 39 planning problems, including 2 hatnoi* instances from SATLIB 
(www.satlib.org), and blocks world, logistics and rockets problems: most 
of these problems have been already used to test SAT solvers (see, e.g., [10]) 
and are available from SATLIB as well. 

Notice that some of these instances are known to be very hard to solve for currently 
available SAT solvers. This is, e.g., the case for the 8 DES instances cnf-r4-*. 
We decided to have them on purpose, given that we plan to use this test set also 
for future experimental evaluations. All the 200 instances are available on SIm’s 
web page. 

For the random problems, we use the FCL model. Formulas are thus generated 
according to the number of propositions N, the number of literals per clause K, 
and the number of clauses L. Usually, K is fixed to 3 (see, e.g., [11]), N is fixed 
to a certain value, while L is varied in order to cover the “100% satisfiable - 
100% unsatisfiable” transition. In our experiments, we considered N = 300. For 
N = 300, none of the solvers that we consider consistently exceeds the time 
limit, and the samples in the phase transition (i.e., for the value of L where the 
probability of a formula being satisfiable is 50%) are not trivial. 



2.2 Experimental Results 

There are many SAT solvers available. In our analysis, we restricted our atten- 
tion to the DLL-based publicly available procedures. Among these, we considered 
some of the most effective. In particular: SATz213, a new version of SATZ [11], 
POSIT ver 1.0 [12], eqsatz [6], relsat ver. 2.00 [10], SATO ver 3.2 [13]. We do not 
describe these systems, for which we refer to the corresponding citation. Here, we 
only point out that these systems rely on very different data-structures and/or 
search heuristics and/or techniques. Indeed, some of them have been designed to 
solve specific classes of problems. This is the case, e.g., of eqsatz, which has been 
designed to solve formulas with a lot of equivalences, like the parity problems. 
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Fig. 1. POSIT, SATZ213, EQSATZ, RELSAT, and SATO. Left: random samples. Right: real 
world instances. 

Of course, we decided to test these solvers (and not others) in a completely arbi- 
trary way. However, the above solvers are state-of-the art, and present interesting 
features. SATZ and SATz213 are well known for their dramatic performances on 
randomly generated tests, posit is interesting because of its differences from SATZ 
(and SATZ213). As reported in [11], the main difference is in the branching heuris- 
tic the two solvers use: both examine k propositions before choosing the one to 
assign, but while posit specifies an upper bound on k, SATZ does not. Almost 
quoting [11], SATZ examines many more propositions at each node than posit. 
Given the fundamental role played by the branching heuristic, we wanted to have 
a direct comparison between these two solvers on our tests, relsat and SATO 
are well known for their effectiveness on real world problems, eqsatz is very re- 
cent, but has already showed impressive performances on problems with lots of 
equivalences [6]. 

Before discussing the experimental results, some more information. All the 
tests have been run on several identical Pentium III, 600MHz, 128MBRAM. The 
execution of a system on an instance (be it random or real world) is stopped after 
1200s of CPU time. All the state-of-the-art solvers have been run in their default 
configuration on all the instances, with the only exception of SATO on randomly 
generated problems. As a matter of fact, SATO implements a form of “size learning” 
in which all the “reasons” of length < 20 are indefinitely added to the set of input 
clauses. Given that learning does not help on randomly generated tests, and that 
RELSAT implements a form of “relevance learning” in which all the “reasons” of 
length < 3 are indefinitely retained, we decided to change SATO’s default when 
tested on random samples. In particular, we let SATO keep only the clauses with 
length < 3, like relsat. Learning clauses of < 3 on random problems does not 
slow down SATO in a significative way as it happens with its default value. In the 
case of SIM, options are changed in order to see the corresponding effects. For 
the plots showing systems’ performances on random tests, the T-axis is the ratio 
L/N; the y-axis -in logarithmic scale- is the median CPU time on 100 samples per 
point for each solver. For the real- word tests, the y-axis is the number of instances 
solved by each solver within the CPU time specified on the x-axis. 

Figure 1 shows the performances of POSIT, SATz213, eqsatz, relsat, and 
SATO on our test sets. As it can be seen, posit and SATz213 are very effective on 
random tests, while relsat (with 164 instances solved) and SATO (with 163) are 
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very effective on the real world ones. The bad performances of posit with respect 
to SATZ213 on the real world instances, point out that SATz213’s philosophy, i.e., 
exploring many propositions before choosing one, seems to pay off. Remarkably, 
EQSATZ is worse than SATz213 on both random and real world tests. These data 
(with the exception of eqsatz, about which not much is known yet) could have 
been expected. Indeed, posit and SATz213 (resp. relsat and SATO) have been 
designed to be effective on random (resp. real world) problems. As we will see, the 
good performances of relsat and SATO on the real world instances are because 
of their learning mechanisms. About eqsatz, its performances are somehow sur- 
prising: evidently, the data-structures and procedures that are at the basis of the 
big wins for problems with lots of equivalences (eqsatz is the only system able to 
solve some of the par32-* parity instances in 1200s) have some overhead causing 
big losses for problems with (almost) no equivalences. 

A more detailed data analysis shows that some instances which are solved by 
some solvers are not solved by the others, and the aother way around. We take 
this fact as an indication that our test set of real world problems is a good one. 
Indeed, the cnf-r4-* instances mentioned in section 2.1, are not solved by any 
solver. 

3 Basic DLL in SIM 

In this section we give a rather detailed description of the data structure, the 
search control, and the heuristics of SiM. These parts together, define a basic im- 
plementation of the DLL algorithm [14]. The conventions we use to present the 
data types and algorithms are those of [15], described at pages. 4,5. In particular, 
array elements are accessed by specifying the array name followed by the index in 
square brackets: for example, A[i] indexes the z-th element of the array A. Com- 
pound data are organized into objects which are comprised of several attributes 
(or Helds). A particular field is accessed using the field name followed by an in- 
stance of its object in square brackets: for example, f[x] accesses the field / in the 
object X. Instances, i.e., variables representing arrays or objects, are treated as 
pointers to the data representing the arrays or objects. If a pointer does not refer 
to any object, we give it the special value nil . Finally, stacks are considered a 
primitive data type and are accessed with the usual primitives Push, Pop, Top, 
etc.. We also assume to have a primitive Flush which flushes the stack. 

3.1 Data Structures and Basic Primitives of SIM 

We now introduce propositions, literals, clauses, and states by defining the corre- 
sponding data types. Assuming that these data types have been already defined 
(as we do below), a proposition for us is an instance of the proposition data type, 
and analogously for the others. In the following, we assume that, T, F, u, UN, LS, 
and RS are six pairwise distinct constants, each one being distinct from nil . A 
proposition data type is comprised of the following attributes (in the following, p 
is a proposition) : 

— value is either T, F, or u: intuitively, value[p] is the value assigned to p. 
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— mode is either UN, LS, or RS: intuitively, mode[p] is UN (resp. LS, resp. RS) if p 
is assigned by unit-propagation (resp. left split, resp. right split), 

— Pos and Neg are arrays of clauses: intuitively, they are the clauses in which p 
occurs positively and negatively. 

For each proposition p, we say that p is open if value[p] = U and valued otherwise. 
A literal data type has the attributes: 

— prop is the proposition associated to (or corresponding to) the literal; 

— V is the sign of the literal, either T or F. 

For each literal I, we say that I is open or valued if the corresponding proposition 
prop[l] is open or valued, respectively; I is positive (resp. negative) if u[Z] = T 
(resp. v[l] = f). a clause data type is comprised of the attributes: 

~ open is a non negative integer, representing the number of open literals in the 
clause; 

— sub is the clause subsumer represented by a proposition; 

— Pits is an array of literals. 

For each clause cl, we say that cl is (i) empty if open[cl] = 0; (ii) unary (resp. 
binary) if open[cl] = 1 (resp. open[cl] = 2); (iii) open if sub[cl] = nil , and valued 
(or subsumed) otherwise. A proposition p occurs in a clause cl ii p = prop[l] for 
some literal I in Lits[cl]. The proposition occurs positively if I is positive and 
negatively otherwise. Finally, a state data type is comprised of the attributes: 

— Props, an array of propositions; 

— Clauses, an array of clauses; 

— Stack and Unit are the search stack of propositions and the unit propagation 
stack of clauses, respectively; 

— open is a non negative integer, representing the number of open clauses. 

Each state s, in order to represent a valid state of the computation, must satisfy 
the following properties: 

1. Props[s] stores precisely the propositions that occur in the clauses of 
Clauses[s]; 

2. for each proposition p in Props[s], Pos[p] stores precisely the clauses contained 
in Clauses [s] where p occurs positively, and Neg [p] stores precisely the clauses 
contained in Clauses[s] where p occurs negatively; 

3. open[s] amounts to the number of open clauses in Clauses[s]; 

4. Stack[s] stores precisely the valued propositions in Props[s]; 

5. Unit[s] contains all the open unary clauses in Clauses[s]. 

If s represents an initial state of the computation, then for s it also true that: 

6. Clauses[s] is not empty; each clause cl in Clauses[s] is open; its number of 
open literals is > 0 and is precisely the length of Lits[cl]-, 

7. each proposition in Props[s] is open. 




Evaluating Search Heuristics and Optimization Techniques 353 



Intuitively, it is easy to see that propositions, literals and clauses data types are 
faithful representations of the corresponding propositional logic objects as defined 
in Section 1. The arrays Clauses[s\ and Props[s], together with their contents of 
propositions and clauses, are a faithful representation of formulas. Notice that we 
explicitly disallow empty input formulas, i.e. {}, and input formulas containing 
empty clauses, e.g. {{}}, since the satisfiability problem for these formulas can be 
simply solved by inspection. 

About the primitives of our propositional data structures, let s be a state, p a 
proposition in Props[s], v a boolean value (i.e., T or f) and m a mode (i.e., one of 
UN, LS, RS). Extend-Prop(s, p, V, m) is the function that extends the valuation 
V with mode m to the whole state s for the proposition p: it returns F if an empty 
clause is found in Clauses[s], and U otherwise. More precisely, Extend-Prop(s, 
p, V, m) does the following: 

— set value[p] to v and mode[p] to m; push p in Stack[s\] 

— for each clause d in Pos[p] (resp. Neg[p\), if suh[d] = nil and i;=T (resp. 
u=f) then set sub[d] = p and decrement open[s] {unit subsumption); 

— for each clause d in Neg[p] (resp. Pos[p]), if sub[d] = nil and i;=T (resp. 
u=f) then decrement open[d] {unit resolution). If the clause becomes unary, 
push it in Unit[s]; 

— return F if any clause became empty in the above step, and U otherwise. 

Notice the use of Unit[s] to detect unary clauses, and of open[s] to keep track of 
subsumed clauses: both operations come for free as a result of performing Extend- 
Prop, while their corresponding simple-minded implementations are linear in the 
size of Clauses[s]. Since Extend-Prop destructively updates the state, we have 
a primitive, Retract-Prop, to revert the effects of Extend-Prop. For lack of 
space, we do not describe Retract-Prop here, but SiM features both primitives, 
thus avoiding the burden of copying and saving the state each time Extend-Prop 
is performed. 

3.2 DLL-Solve, Look-Ahead, and Look-Back in SIM 

In Figure 2 we present our implementation of DLL algorithm. In the figure, 
Check-SAT(s) returns T if open[s] = 0, and F otherwise; s in DLL-Solve has 
to be the initial state corresponding to the input formula. DLL-Solve returns T 
if the input formula is satisfiable and F otherwise. 

3.3 Heuristic in SIM 

A heuristic is a function that given an open state s and an open proposition p in 
s, computes a score for p and a branching order determining whether p should be 
satisfied or falsified first. The optimal heuristic is the one which chooses an open 
proposition and a branching order that will cause the exploration of less nodes. 
Unfortunately, deciding whether a proposition p is optimal is harder than deciding 
the satisfiability of the formula itself [16], so we resort to approximations: a good 
approximate heuristic does not require a lot of computation time and gives higher 
scores to the propositions that, once assigned, lead to simpler problems. 
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Look-Ahead(s) 

1 r u 

2 while r = u and 

length[Unit[s\\ > 0 do 

3 cl -h- Pop( 

4 if sub[cl] = NIL then 

5 i -i- 1 

6 repeat 

7 I = Lits[cl][i] 

8 * i + 1 

9 until value[prop[l]] = u 

10 r ■«— Extend-Prop(s, prop[l], v 

11 if r — u and open[s] = 0 then 

12 return T 

13 else 

14 return r 

DLL-Solve(s) 

1 repeat 

2 case Look-Ahead(s) of 

3 T : return T 

4 F : p Look-Back(s, V, m) 

5 u : p ■«— Heuristic(s, V, m) 

6 if p ^ NIL then 

7 Extend-Prop(s, p, V, m) 

8 until p = NIL 

9 return Check-SAT(s) 



Look-Back(s, var v, var m) 

1 ELUSH(17mt[s]) 

2 repeat 

3 p Top(5'tocfc[s]) 

4 if mode[p] ^ LS then 

5 RetractProp(s, p) 

6 until length[Stack[s]] = 0 or 

mode[p] = LS 

7 if length[Stack[s]] > 0 then 

8 if value[p] = t then 

9 r F 
, un) 10 else 

11 r -s— T 

12 m -s— RS 

13 Retract-Prop(s, p) 

14 else 

15 p ■«— nil 

16 return p 



Fig. 2. Implementation of the DLL algorithm in SIM. 



We now describe in details the branching heuristics that we have tested. Our in- 
terest lies in evaluating some of the search heuristics that have been proposed. We 
restrict our attention to relsat’s, SATO’s, and SATZ’s default branching heuris- 
tics, and we compare them with a new heuristic (called unitie2) that pushes 
“SATZ’s philosophy” to the limit: it always examines every open proposition at 
each branching node. The heuristics of relsat, SATO and SATZ have been im- 
plemented by looking at the available literature, but in particular by carefully 
inspecting the solvers’ code. SATZ’s heuristics is described in detail in [11].^ An 
high-level description of SATO heuristic is given in [13]. Here we give a much 
more detailed presentation of it, as we understood it from the code and as we 
implemented it in SiM. We are not aware of any detailed description of relsat’s 
heuristic, so we present it according to its implementation in relsat and SiM. 

We start describing SATO’s heuristic. Then, before discussing relsat’s heuris- 
tic, we present the simpler unitie2, and reuse parts of its description also for 
RELSAT. 

SATO’s heuristic: before the search begins, one of the following options is selected: 



^ Notice however that the heuristic in SATZ213 has changed. 
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1. if the percentage of non horn clauses in the input clauses is bigger than 28.54 
of the total, then the heuristic scores all the open propositions, 

2. otherwise, the heuristic scores only the first 7 propositions occurring in the 
first 7 shortest non horn clauses. 

The score of a proposition p is given by (pos-p+1) x (neg-p+ 1), where pos-p is the 
number of open binary clauses in which p occurs, and analogously for neg-p. The 
branching is on the proposition with the highest score, its value determined in the 
following way: if the percentage of non-horn input clauses is less (resp. greater) 
than 2.36 of the total, then the value of p is T (resp. f) if pos-w > neg-w and F 
(resp. t) otherwise. 

UNITIE2 heuristic: For each open proposition p, this heuristic scores p on the basis 
of the effective simplifications that would be caused by assigning p on one side, 
and then -<p on the other. To compute the simplifications caused by assigning 
p, a ’’lean” version of Extend-Prop and Look- Ahead is used. In particular 
the lean versions take care of unit resolutions only. unitie2 also incorporates a 
failed literal detection mechanism: if assigning p would cause a contradiction (in 
the Look-Ahead), it safely assigns ^p as if it was a unit (analogously for -ip). 
Assuming that there are no failed literals, and assuming p (resp. -~'p) leads to a set 
of clauses C’*' (resp. C”), the score of p is given by 2 x binposnew x binnegnew + 
unitpos + unitneg where binposnew (resp. binnegnew) is the number of binary 
clauses in (resp. C~) and not in C, and unitpos (resp. unitneg) is the number 
of unit propagations in the Look-Ahead after p (resp. ^p) is assigned. Notice 
that UNITIE2 always scores all the open propositions. SATZ [11], under certain 
conditions, scores only a subset of the open propositions. As we said, unitie2 
pushes SATz’s philosophy to the limit. However, unitie2 uses a slightly different 
formula for computing the score of propositions. In fact, SATZ’s formula (in our 
notation) is 1024 x binposnew x binnegnew+ binposnew+ binnegnew+ 1. The idea 
behind unitie2 is to use unitpos + unitneg to break ties between propositions 
having the same 2 x binposnew x binnegnew. 

relsat’s heuristic: Also relsat scores an open proposition p on the basis of 
the effective simplifications that would be caused by assigning p on one side, and 
then -ip on the other. However, it does this differently from unitie2. For each 
open proposition p, relsat first checks whether (i) the number of binary clauses 
in which p occurs positively is augmented from the previous branching node, and 
(ii) p has not already been assigned in the heuristic (i.e., if p has not been assigned 
by unit propagation while scoring another proposition). If this is the case, then 
RELSAT assigns -'p and does the consequent (lean) Look-Ahead. If -•p fails, 
then -ip is immediately assigned causing the contradiction and the immediate 
backtrack.^ If ->p does not fail, the number of unit propagations performed in the 
Look-Ahead is counted as the score of p. If one of the conditions {i), (ii) is 
false, and if p occurs positively in some binary clauses, then the score pos-p of p 
is the number of binary clauses in which p occurs positively. All the above, has to 
be analogously done for computing neg-p corresponding to ~'p. The above rules 
produce some non-null pos-p or neg-p if there are binary clauses. If there are not, 

^ The contradiction arising by assigning -ip is explicitly generated in order to have the 
necessary interactions with backjumping and learning, see next Section. 
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O SimRelSAT 

X SimUniti© 

+ SimSATZ 

• SimSATO 


X ^ 1 * * 

si 





Fig. 3. Basic SIM. Left: random samples. Right: real world instances. 



then RELSAT applies another strategy: for each proposition p, pos-p is given by 
256 X c ?3 + 16 X c ?4 + 4 X cfe + cIq where ck (3 < i < 5) is the number of clauses in 
which the number of open literals is i, and cIq is the number of clauses in which 
the number of open literals is greater or equal to 6. Analogously for neg-p. At 
this point, for some proposition p we have two scores pos-p, neg-p one of which is 
distinct from 0. For all these propositions p, we compute a single score given by 
2 X pos-p X neg-p + pos-p + neg-p + 1. Finally, relsat chooses randomly among 
the propositions whose last computed score is > 0.9 of the best score, and fixes 
its value to t or f randomly. 



3.4 Experimental Comparison with SIM 

We now look at the experimental analysis. Figure 3 shows SIm’s performances 
when using the different heuristics. In the figure, SIMSATO means SiM using SATO’s 
heuristic, and similarly for the others. Consider Figure 3. Our plots on the ran- 
dom problems are similar to the ones in Section 2: SIMSATZ performs better than 
SIMRELSAT and SIMSATO, these last two systems performing roughly in the same 
way. The good performances of SIMSATZ on random problems is highlighted by 
Table 1, showing the minimum, median, IQ-Range and maximum of the CPU 
time of the solvers at the phase transition.^ Besides having the lowest median 
CPU time, SIMSATZ is also the system whose performances have less variations. 

From the plots on the real world problems, we see that SIMSATO (with 130 
problems solved) is much worse than SIMSATZ (140), SIMUNITIe2 (141) and, above 
all, SIMRELSAT (146). On the other hand, SIMSATO is the fastest on the 11 quasi- 
group problems: this is not surprising given that this heuristic has been tuned on 
these problems [13]. SIMUNITIe2 gives reasonable timings on the two test sets. 



® Out of 100 numbers listed in ascending order, the Q-percentile is the Q-th in the list. 
The minimum is thus the 1-percentile, the median the 50-percentile, the maximum 
the 100-percentile, and the IQ-range is the 75-percentile minus the 25-percentile. 
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Table 1. Basic SIM. Random tests. Performances at the phase transition. 



Data SIMSATO SIMRELSAT SIMUNITIE2 SIMSATZ 



Min 


31.56 


29.24 


7.09 


1.53 


Median 


97.18 


89.95 


36.27 


6.00 


IQ-range 


70.33 


66.38 


25.34 


3.60 


Max 


256.45 


348.16 


121.43 


14.96 



4 Conflict-Directed Backjumping in SIM 

Since the basic DLL algorithm relies on simple chronological back-tracking, it is 
not infrequent for DLL implementations to keep exploring a possibly large subtree 
whose leaves are all dead-ends. This phenomenon occurs also when the formula is 
satisfiable, but some choice performed way up in the search tree is responsible for 
the constraints to be violated. The solution, borrowed from constraint network 
solving [17], is to jump back over the choices that were not at the root of the 
conflict, whenever one is found. The corresponding technique is widely known as 
(conflict-directed) backjumping (CBJ). 



4.1 CBJ-Look-Back in SIM 

The function CB J-Look-Back in Figure 4. left, implements conflict-directed 
backjumping in SiM. In the figure, 

— wr is meant to store the subset of the propositions in Stack[s] whose as- 
signment is a reason for the discovered inconsistencies. Technically, wr is a 
clause initialized by Init-Wr(s) which returns a clause cl in Clauses[s] hav- 
ing open[cl] = 0. At least one such clause exists, given that when CBJ-Look- 
Back is invoked, at least one empty clause belongs to Clauses[s]. 

— reason is a new attribute of the proposition data type and it is a clause. 

— Update-Wr(p, wr) returns a clause cl such that: 

~ open[cl] = 0; 

— sub[cl] = NIL ; 

— Lits[cl] is the array of literals obtained by merging (without duplications) 
reason[p] and Lits[wr]. Then the literals I € Lits[cl\ having prop[l] = p are 
eliminated from Lits[cl\. 

Indeed, the procedure Look- Ahead in Figure 2 needs to be extended in order to 
set the reason of p when p is assigned by unit. The following line of code has to be 
written below line 9 of Look- Ahead, at the same level of indentation of line 9: 

reason[prop[l]] ^ cl 

For an high-level description of CB J-Look-Back, see [10]. 
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CBJ-Look-Back(s, var v, var m) 
1 FLUSH((7mi[s]) 

2wr Init-Wr(s) 

3 repeat 

4 p = TOP(>S’iacfc[s]) 

5 if p € wr then 

6 if mode[p] G {uN, RS} then 

7 wr <— Update-Wr(p, wr) 

8 Retract-Prop(s, p) 
else 



9 



20 until length[Stack[s]] = 0 

21 return nil 



CBJ-Learn-Look-Back(s, var v, var m) 

1 Flush( t/nii[s]) 

2 wr G- Init-Wr(s) 

3 repeat 

4 p = TOP(5tacfc[s]) 

5 if p € wr then 

6 if morfe[p] € {uN, RS} then 

7 wr <— Update- Wr(p, wr) 

8 Learn(s, wr) 

9 Forget(s, p) 



10 


if value[p] = t then 


10 


Retract-Prop(s, p) 


11 


V <— F 


11 


else 


12 


else 


12 


if value[p] = T then 


13 


V <— T 


13 


r F 


14 


m RS 


14 


else 


15 


Retract-Prop(s, p) 


15 


T 


16 


reason[p] wr 


16 


m <— RS 


17 


return p 


17 


Retract-Prop(s, p) 


18 


else 


18 


case Look-Ahead(s) of 


19 


Retract-Prop(s, p) 


19 


T: return nil 



20 F: Flush( Unii[s]) 

21 wr <— Init-Wr(s) 

22 U: if p = U then 

23 reason[p] wr 

24 return p 

25 else 

26 return Heuristic(s, v, m) 

27 else 

28 Retract-Prop(s, p) 

29 until length[Stack[s]] — 0 

30 return nil 



Fig. 4. Conflict-directed backjumping and learning in SIM. 



4.2 Experimental Comparison with SIM 

Figure 5 shows SIm’s results when incorporating backjumping. As it can be ob- 
served from the figure on the left, backjumping does not produce benefits on the 
random tests. Indeed, on the random problems the reason of an inconsistency is 
almost never localized to a small subset of the input clauses. It is nevertheless 
interesting to have a look at Table 2, corresponding to Table 1 in Section 3. By 
comparing the Min of the two tables, we see that for SIMSATO and SIMRELSAT, 
i.e., the two systems whose initial branching literals are selected using a heuristic 
not based on Look-Ahead, backjumping may produce some benefits. The fact 
that backjumping never helps for SIMUNITIe2 and SIMSATZ, are an indication that 
for random instances, LoOK-AHEAD-based heuristics are a good choice. Notice 
however that the overhead due to backjumping is limited: comparing the Max, we 
see that it is roughly in the order of the 20% or less. We take this as an indication 
that our implementation of backjumping is good. 
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CLAUSES / VARIABLES RATIO 30 SECONDS TIME SLICES 



Fig. 5. SIM with backjumping. Left: random samples. Right: real world instances. 
Table 2. SIM with backjumping. Random tests. Performances at the phase transition. 
Data SIMSATO SIMRELSAT SIMUNITIE2 SIMSATZ 



Min 


32.95 


8.44 


8.35 


1.86 


Median 


102.95 


88.61 


42.59 


7.23 


IQ-range 


73.32 


75.42 


30.61 


4.42 


Max 


278.29 


395.23 


143.35 


18.15 



Looking at the results on the real world tests (Figure 5), we see that rel- 
SAT heuristic (153 instances solved) produces still some advantages, and that the 
differences with SATO heuristic (146 solved) is not as big as before. SiMSATZ and 
SIMUNITIE2 solve respectively 147 and 144 instances. SIMUNITIe2 has some ad- 
vantage at the beginning, but at the end it looses it. Comparing the data with 
the corresponding ones in Section 3, we see that for each heuristic that we used, 
backjumping allows to solve more instances. 

5 Learning in SIM 

For backjumping can be very effective in “shaking” the solver from regions where 
no solutions can be found, but since the cause of the conflict is discarded as soon 
as it gets mended, the solver may get repeatedly stuck in such regions. To escape 
this pattern, some sort of global knowledge is needed: the negation of the causes 
of the conflicts may be added as an additional constraint (i.e., as a clause) that 
has to be satisfied. Adding all the clauses corresponding to the reasons of the 
discovered conflicts has the advantage that the same mistake is never repeated. 
However, this may cause an exponential blow up of the size of the formula. It is 
therefore necessary to introduce some limit to the number of stored clauses, i.e., 
a mechanism that enables the solver to “forget” clauses. 

In size learning, a clause corresponding to a conflict is stored if it has no more 
than Order literals. Size learning is implemented in SATO, and is used by default 
with Order=20. In relevance learning, a clause corresponding to a conflict is always 
stored. A “forget” mechanism eliminates the learned clauses in which more than 
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BEAL WORLD PROBLEMS - 200 instances REAL WORLD PROBLEMS - 200 instances 




Fig. 6. SIM with learning. Real world instances. Left: size vs relevance learning. Right: 
different heuristics. 



Order literals are open or have been assigned differently since the time they have 
been added to the set of input clauses. Relevance learning is implemented by 
RELSAT, and is used by default with Order=3. 



5.1 CBJ-Learn-Look-Back in SIM 

The procedure CB J-Learn-Look-Back in Figure 4. right, implements learning 
in SIM. In the figure, Order is yet another attribute of a state (set initially by an 
input parameter), while the calls Learn(s, wr) and Forget(s, p) have different 
meaning depending on whether they do “relevance learning” as relsat, or “size 
learning” as SATO. 

size learning: If wr is a clause with less than Order [s] literals, Learn(s, wr) 
adds the clause wr to Clauses[s], and for each literal I in Lits[wr], adds wr to 
Pos[prop[l]] (resp. Neg[prop[l]]) if I is positive (resp. negative). Forget(s, p) is 
the null instruction. 

relevance learning: Learn(s, wr) performs the above operations in any case, 
i.e., also if wr is a clause with more than Order[s] literals. Forget(s, p) deletes the 
learned clauses cl such that |{Z: iG Lits[cl], value[prop[l]] € {u[?],u}}| > Order[s]. 

Notice the call to Look-Ahead (line 18) and the following piece of code. 
Indeed, this call is necessary because of the possible presence of unary clauses 
among the ones that we learned. 



5.2 Experimental Comparison with SIM 

For lack of space we do not show the plots on the random instances. They have 
the same qualitative behavior of the ones we already showed, and are available at 
Sim’s web site. Here we only remark that learning does not produce benefits on 
random instances, but its overhead is always quite small. 

On the real world instances, [10] reports that relevance learning is more ef- 
fective than size learning. Here we show only one plot (Figure 6. left) in which 
we compare SIMSATO with relevance learning Order=3, and SIMSATO with size 
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Table 3. Solvers performances on random and real world tests. 



Random 


POSIT 


EQSATZ 


SATZ213 


SATO 


RELSAT 


SIM 

SATZ 


SIM 

RELSAT 


SIM 

SATO 


SIM 

UNITIE2 


Min 


3.36 


40.90 


1.77 


10.68 


30.59 


1.53 


29.24 


31.56 


7.09 


Median 


9.88 


112.52 


4.26 


81.31 


88.15 


6.00 


89.95 


97.18 


36.25 


IQ-range 


7.39 


73.76 


2.46 


60.74 


63.99 


3.60 


66.38 


70.33 


25.34 


Max 


39.76 


323.75 


10.91 


284.99 


344.77 


14.96 


348.16 


256.45 


121.43 


Real World 
#Solved 


126 


137 


141 


163 


164 


166 


159 


156 


148 



learning Order=20 as in SATO. As it can be seen, the plot with relevance learning 
dominates the one with size learning. 

On the basis of the above, we tested the various heuristics only with relevance 
learning Order=2>. The results are shown in Figure 6. right. We see that the heuris- 
tic which allows SiM to solve more instances is the one featured by SATZ: SIMSATZ 
solves 166 instances, SIMRELSAT 159, SIMSATO 156, SIMUNITIe2 148. Interestingly, 
we have seen that unitie2 heuristic -on the real world tests- was second only 
to relsat’s heuristic when used in basic SiM. However, if we include backjump- 
ing and/or learning, we see that its performances gets worse in comparison with 
the other heuristics. Analogously, SIMSATZ’s performances becomes better as soon 
as we introduce backjumping and/or learning. These are clear indications of the 
obvious fact that the benefits of a heuristics depend on its interactions with the 
other implemented techniques (see also [18]). 

6 Final Remarks 

We conclude the paper with Table 3, comparing SiM with the various settings, and 
the solvers that we considered in Section 2. From this table, we see that SIMSATZ 
solves more real world instances than all the other solvers, including SiM with 
other heuristics. This last fact shows that the good performances of relsat and 
SATO on real world instances (see Section 2) are due to their learning mechanisms. 
As we said, SIMSATZ is not always the best on all the classes of problems: different 
heuristics behave better than others on different problems. Tuning the heuristic 
on specific classes pays off. SIMSATZ is also the best, among the SiM-systems, for 
solving random problems, and its performances are not far away from SATz213’s 
performances. Having one good settings for both random and real world instances 
is indeed very positive. 

The good performances of SIMSATZ also indicate that the quality of our im- 
plementation is comparable to the corresponding state-of-the-art solvers. Indeed, 
if we compare the performances of SIMSATZ on real world instances as shown in 
Figure 6. right, with relsat’s and SATO’s corresponding performances (see Fig- 
ure 1. right), we see that both relsat and SATO are able to solve more instances 
than SIMSATZ in the first 700 seconds. Our explanation is that relsat and SATO 
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feature rather sophisticated forms of pre-processing of the input clauses, while 
SIM has no pre-processing at all. Indeed, pre-processing allows for smaller solution 
times, and we are working on its implementation. 

Finally, we showed only a small part of the data that we have collected. For 
example, we have a figure comparing SIMRELSAT and relsat on the random tests: 
the plots overlaps and are distinguishable with difficulty. Analogously for SIMSATO 
and SATO.^ These plots show that our reconstruction of their branching heuristics 
is accurate. 

All the plots, SIM and more information about SiM are available at SiM web 
page: 



www.mrg.dist .unige . it/star/sim 



Acknowledgements. We would like to thank Holger H. Hoos and Thomas 
Stiitzle for the excellent work they do with SATLIB. This work is partially sup- 
ported by MURST and Intel Corp. 

References 

1. J. Gu, P. W. Purdom, J. Franco, and B. W. Wah. Algorithms for the satisfia- 
bility (sat) problem: A survey. In Satisfiability Problem: Theory and Applications, 
DIMACS Series in Discrete Mathematics and Theoretical Gomputer Science, pages 
19-153. AMS, 1997. 

2. D. G. Mitchell, B. Selman, and H. J. Levesque. Hard and Easy Distributions for 
SAT Problems. In Proc. of AAAI, pages 459-465. AAAI Press, 1992. 

3. J. Franco and M. Pauli. Probabilistic analysis of the Davis-Putnam procedure for 
solving the satisfiability problem. Discrete Applied Mathematics, 5:77-87, 1983. 

4. Bart Selman, Henry Kautz, and David McAllester. Ten Ghallenges in Propositional 
Reasoning and Search. In Proc. of IJCAI, pages 50-54. Morgan-Kauffmann, 1997. 

5. Massacci and Marraro. Logical Cryptanalysis as a SAT Problem. JAR: Journal of 
Automated Reasoning, 24, 2000. 

6. Chu Min Li. Integrating Equivalency Reasoning into Davis-Putnam Procedure. In 
Proc. of AAAI. AAAI Press, 2000. 

7. T. E. Uribe and M. E. Stickel. Ordered Binary Decision Diagrams and the Davis- 
Putnam Procedure. In Proc. of the 1st International Conference on Constraints in 
Computational Logics, 1994. 

8. A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic Model Checking without 
BDDs. In Proceedings of TACAS, volume 1579 of LNCS, pages 193-207. Springer 
Verlag, 1999. 

9. H. Zhang and M. E. Stickel. Implementing the Davis-Putnam Method. In Highlights 
of Satisfiability Research in the Year 2000. lOS Press, 2000. 

This holds for the median time. Indeed, by looking at Table 3, we see that SIMSATO’s 
and SATO’s performances are not as similar as SIMRELSAt’s and relsat’s performances 
are. This is due to the particular heuristic of SATO, which in some cases chooses the 
first 7 propositions in the first 7 non Horn clauses. Indeed, the selected proposition 
(and thus the behavior) depends also on the order in which clauses are stored, and 
-in this- SIMSATO is different from SATO. 




Evaluating Search Heuristics and Optimization Techniques 363 



10. R. J. Bayardo, Jr. and R. C. Schrag. Using CSP Look-Back Techniques to Solve 
Real-World SAT instances. In Proc. of AAAI, pages 203-208. AAAI Press, 1997. 

11. Chu Min Li and Anbulagan. Heuristics Based on Unit Propagation for Satisfiability 
Problems. In Proc. of IJCAI, pages 366-371. Morgan-Kauffmann, 1997. 

12. Jon W. Freeman. Improvements to propositional satisfiability search algorithms. PhD 
thesis, University of Pennsylvania, 1995. 

13. H. Zhang. SATO: An efficient propositional prover. In Proc. of CADE, volume 1249 
of LNAI, pages 272-275. Springer Verlag, 1997. 

14. M. Davis, G. Logemann, and D. Loveland. A machine program for theorem proving. 
.Journal of the ACM, 5(7):394-397, 1962. 

15. T. H. Cormen, C. E. Leiserson, and R. R. Rivest. Introduetion to Algorithms. MIT 
Press, 1998. 

16. P. Liberatore. On the complexity of choosing the branching literal in DPLL. Artificial 
Intelligence, 116(l-2):315-326, 2000. 

17. R. Dechter, I. Meiri, and J. Pearl. Temporal Constraint Networks. Artificial Intel- 
ligence, 49:61-95, 1991. 

18. F. Copty, L. Fix, E. Giunchiglia, G. Kamhi, A. Tacchella, and M. Vardi. Benefits of 
Bounded Model Checking at an Industrial Setting. In Proc. of CAV, LNCS. Springer 
Verlag, 2001. To appear. 




QuBE: A System for Deciding Quantified 
Boolean Formulas Satisfiability* 



Enrico Giunchiglia, Massimo Narizzano, and Armando Tacchella 
DIST, Universita di Genova, Viale Causa 13, 16145 Genova - Italy 



1 Introduction 

Deciding the satisfiability of a Quantified Boolean Formula (QBE) is an impor- 
tant research issue in Artificial Intelligence. Many reasoning tasks involving plan- 
ning [1] , abduction, reasoning about knowledge, non monotonic reasoning [2] , can 
be directly mapped into the problem of deciding the satisfiability of a QBE. 

In this paper we present QuBE, a system for deciding QBFs satisfiability. 
We start our presentation in § 2 with some terminology and definitions necessary 
for the rest of the paper. In § 3 we present a high level description of QuBE’s 
basic algorithm. QuBE’s available options are described in § 4. We end our 
presentation in § 5 with some experimental results showing QuBE effectiveness 
in comparison with other systems. QuBE, and more information about QuBE, 
are available at www . mrg . dist . unige . it/ star/ qube . 

2 Formal Preliminaries 

Consider a set P of propositional letters. An atom is an element of P. A literal 
is an atom or the negation of an atom. For each literal I, (i) I is x if I = ->x, and 
is ->x if I = x; {ii) |?| is the atom occurring in 1. A clause C is an n-ary (n > 0) 
disjunction of literals such that no atom occurs twice in C . A propositional 
formula is a fc-ary (k > 0) conjunction of clauses. As customary, we represent a 
clause as a set of literals, and a propositional formula as a set of clauses. 

A QBE is an expression of the form 

QlXi . . .QnXn^, (n > 0) (1) 

where every Qi (1 < t < n) is a quantifier, either existential 3 or universal V; 
Xi, . . . , x„ are pairwise distinct atoms in P; and is a propositional formula in 
the atoms xi, . . . , QiXi . . . QnXn is the prefix and <P is the (quantifier-free) 
matrix of (1). 

Consider a QBE of the form (1). A literal I occurring in is: 

* We wish to thank Marco Gadoli, Rainer Feldmann, Theodor Lettman, Jussi Rinta- 
nen, Marco Schaerf and Stefan Schamberger for providing us with their systems and 
helping us to figure them out during our experimental analisys. This work has been 
partially supported by ASI and MURST. 
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1 


p = (the input QBE); 


16 


function Backtrack(res) { 


2 Stack= (the empty stack); 


17 


while {{Stack is not empty)) { 






18 


1 = Retract{); 


3 function Simplify{) { 


19 


if ((|/|.mode == l-SPLIt) &&: 


4 


do { 


20 


((res==FALSE &&: \l\.type==3) I I 


5 


P' = P-, 


21 


(res==TRUE &:& \l\.type==V))) 


6 


if ((a contradictory clause is in p)) 


22 


{|/|.mode = R-SPLIT; return 1; }} 


7 


retnrn False; 


23 


return Null; } 


8 


if ((the matrix of p is empty)) 






9 


retnrn True; 


24 


function QubeSolver{) { 


10 


if {{1 is unit in p)) 


25 


do { 


11 


{ \l\.mode = UNIT; Extend{l)\ } 


26 


res = Simplify {)■, 


12 


if {{1 is monotone in p)) 


27 


if {res == Undef) 1 = ChooseLiteral{): 


13 


{ \l\.mode — PURE; Extend{l)\ } 


28 


else 1 = Backtrack{res); 


14 


} while (p' != p)\ 


29 


if {1 != Null) Extend{l); 


15 


return Undef; } 


30 


} while {1 != Null); 






31 


return res; } 



Fig. 1. The algorithm of QuBE. 



— existential if 3|?| belongs to the prefix of (1), and is universal otherwise. 

— unit in (1) if I is existential, and, for some fc > 0, 

— a clause {/, h, . . . , h} belongs to <P, and 

— each expression V|^i| {I < i < k) is at the right of 3|/| in the prefix of (1). 

— monotone if either I is existential, I occurs in <?, and I does not occur in <l>; 
or I is universal, I does not occur in <P, and I occurs in <P. 

A clause C is contradictory if no existential literal belongs to C. 

The semantics of a QBE ip can be defined recursively as follows. If p contains 
a contradictory clause then p is not satisfiable. If the matrix of p is empty then 
p is satisfiable. If p is 3a;'!/' (resp. p is satisfiable if and only if p^ or (resp. 

and) p^x are satisfiable. li p = Qxip is a QBE and I is a literal, pi is the QBE 
obtained from ip by deleting the clauses in which I occurs, and removing I from 
the others. It is easy to see that if is a QBE without universal quantifiers, the 
problem of deciding p satisfiability reduces to propositional satisfiability (SAT). 

Notice that we allow only for propositional formulas in conjunctive normal 
form (CNE) as matrices of QBEs. Indeed, by applying standard CNE transforma- 
tions (see, e.g., [3]) it is always possible to rewrite a QBE into an equisatisfiable 
one satisfying our restrictions. 



3 QuBE Algorithm 

QuBE is implemented in C on top of SIM, an efficient SAT decider developed 
by our group. A C-like high-level description of QuBE is shown in Eigure 1. In 
this Eigure, 

— p is a global variable initially set to the input QBE. 
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— Stack is a global variable storing the search stack, and is initially empty. 

— 3, V, False, True, Undef, Null, unit, pure, l-split, r-split are pairwise 
distinct constants. 

— for each atom x in (t) x.mode is a variable whose possible values are 
UNIT, PURE, L-SPLiT, R-SPLIT, and (H) x.type is 3 if a; is existential, and V 
otherwise. 

— Extend{l) first pushes I and (p in the stack; then deletes the clauses of (p in 
which I occurs, and removes I from the others. 

— Retract{) pops the literal and corresponding QBF that are on top of the 
stack: the literal is returned, while the QBF is assigned to p. 

— Simplify 0 simplifies p till a contradictory clause is generated (line 6), or the 
matrix of p is empty (line 8), or no simplification is possible (lines 5, 14). 

— ChooseLiteralQ returns a literal I occurring in p such that for each atom x 
occurring to the left of |/| in the prefix of p, x does not occur in p^ or x is 
existential iff I is existential. ChooseLiteralQ also sets \l\.mode to l-SPLIT. 

— Backtrack(res): pops all the literals and corresponding QBFs (line 18) from 
the stack, till a literal I is reached such that \l\.mode is l-SPLIT (line 19), and 
either (t) I is existential and res = False (line 20); or (ii) I is universal and 
res = True (line 21). If such a literal I exists, \l\.mode is set to R-SPLIT, and 
I is returned (line 22). If no such literal exists. Null is returned (line 23). 

QuBE returns True if the input QBF is satisfiable, and False otherwise. It 
is easy to see that QuBE, like other QBF procedures (see, e.g., [4,5,6]), is a 
generalization of the Davis, Logemann, Loveland procedure (DLL) for SAT: 
QuBE and DLL have the same behavior on QBFs without universal quantifiers. 

4 QuBE Options 

Consider Figure 1. QuBE ver. 1.0 features backjumping, trivial truth, six dif- 
ferent branching heuristics, i.e., implementations of ChooseLiteral, and other 
control options. 

The backjumping procedure implemented in QuBE [7] is a generalization of 
the conflict-direct backjumping procedure as implemented in SAT solvers. As 
far as we know, QuBE is the only QBF solver with backjumping. Because of the 
potential overhead, backjumping has to be enabled when compiling the system, 
while all the other heuristics and optimizations can be enabled/disabled using 
QuBE’s command line. 

QuBE’s command line is: 

qube [-tt] [-heuristics unit I bohm I jw2] [-length exists I all] 
[-verbose] [-timeout <nl>] [-memout <n2>] <file-name>. 

By default, after the simplifications following the branch on an universal 
variable have been performed, QuBE checks whether the formula obtained from 
p by deleting universal literals is satisfiable. If it is, then p is satisfiable [4]. 
This optimization can produce dramatic speed-ups, particularly on randomly 
generated QBFs (see, e.g. [4]). The option -tt disables this check. Notice that 
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ours is an optimized version of “trivial truth” as described in [4] , where the check 
is performed at each branching node. 

QuBE branching heuristics have been inspired by the SAT literature. Our 
current version includes b6hm,jw2 and unit heuristics. The behavior of these 
heuristics depends on the notion of “length” of a clause. QuBE features two 
definitions of length of a clause C: the number of literals in C (-length all) as 
in [4,5], and the number of existential literals in C (-length exists) as in [6]. 
By combining these options, six different branching heuristics are possible. 

The bohm and jw2 heuristics are, respectively, a generalization of Bohm’s 
heuristic [8] and “two-sided Jeroslow-Wang” heuristic [9] for SAT. The idea 
behind bohm and jw2 is to choose literals that occur as often as possible in 
the shortest clauses of ip. The hope is that by assigning such literals, we will 
have the largest amount of simplification. The unit heuristic is based on the 
one implemented in SATZ [10]. As opposed to bohm and jw2, the unit heuristic 
tentatively assigns truth values to atoms in order to get the exact amount of 
simplification caused by such assignments. 

Independently from the particular branching heuristic used, if the selected 
atom X is existential (resp. universal), QuBE tries first a; if a: has more (resp. less) 
positive than negative occurrences in the matrix of tp. The idea is to maximize 
the chances of showing p satisfiability (resp. unsatisfiability) in the first branch. 

The -verbose options enables printing search information during the exe- 
cution, including the variable x being assigned, its mode (whether is a unit, a 
pure, . . . ), and -in case it is a l-SPLIT- whether x or -la; is tried first. 

The -timeout <nl> and -memout <n2> options are used to limit the amount 
of resources used by QuBE. Whenever QuBE exceeds <nl> seconds of CPU time 
(resp. <n2> megs of RAM), its execution is halted. 

5 QuBE Performances 

To evaluate QuBE performances, we compare it with decide [5], Evaluate [4], 
QKN [11], and Qsolve [6]. According to our preliminary experimental results, 
the options -heuristics bohm -length exists give good performances on all 
the problems, and are thus the default. The tests run on a Pentium III, 600MHz, 
128MBRAM. 

We consider sets of randomly generated QBFs. We generate QBFs according 
to model A as described in [12]. In this model, each QBF has the following 4 
properties: (i) the prefix consists of k sequences, each sequence has n quantifiers, 
and any two quantifiers in a sequence, are of the same type, (ii) the rightmost 
quantifier is 3, (mi) the matrix consists of I clauses, (iv) each clause consists of 
h literals of which at least 2 are existential. The Figure 2 shows the median, out 
of 100 samples, of the CPU times when k = 2, n = 100 (left) and k = 5, n = 100 
(right). We fixed h = 5 because it yields harder QBFs than h < 5 (see [12]), and 
I (on the x-axis) is varied in such a way to cover the “100% satisfiable - 100% 
unsatisfiable” transition (shown in the background) . Notice the logarithmic scale 
on the t/-axis. 
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Fig. 2. CPU times, median, 100 samples/point. Background: satisfiability percentage. 
Left: k = 2,n = 100, Right: k = 3,n = 100, 

Looking at Figure 2 (left) we immediatly see that QuBE and Qsolve per- 
form roughly the same (with Qsolve being slightly better than QuBE) and 
better than all the other solvers that we tested. Still in Figure 2, for fc = 3 
we further observe that QuBE is always faster than Qsolve, sometimes by or- 
ders of magnitude. Since Qsolve runs trivial truth (and trivial falsity), but no 
backjumping, we take this as an evidence on the effectiveness of backjumping. 

Our experimental analysis includes the 38 problems contributed by Rintanen 
in [5]. They are translations from planning problems into the language of QBFs 
and the best solver overall turns out to be decide with 33 problems solved and 
42.28s average running time (on solved samples), followed by QuBE, with 18 
problems solved and 73.21s, and Qsolve with 11 problems solved and 149.29s. 
QKN and Evaluate with, respectively, 1 and 0 problems solved, trail the list. 
In this regard, we point out that decide features “inversion of quantifiers” and 
“sampling” mechanisms which are particularly effective on these benchmarks . 
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Abstract. We describe the main characteristics of version 0.61 of the 
E equational theorem prover. E is based on superposition (with literal 
selection) and rewriting. A particular strength of E is the ability to con- 
trol the proof search very well. This is reflected by a very powerful and 
flexible interface for the specification of clause selection functions, and 
by a wide variety of functions for the selection of inference literals. We 
discuss some important aspects of the implementation and demonstrate 
the performance of the prover by presenting experimental results on the 
TPTP. Finally, we describe our future plans for the system. 



1 Introduction 

E is a fully automatic theorem prover for clausal logic with equality. It is based 
ou a variaut of the superpositiou calculus [BG94] aud the DISCOUNT loop 
proof procedure. The prover cau read (aud write) proof problems iu its uative, 
PROLOG-like E-LOP syutax or iu TPTP syutax. E is completely implemeuted 
iu ANSI C aud compiles cleauly ou most commou UNIX versious. 

This paper describes E 0.61, the direct successor of the versiou that wou 
the MIX category of CASC-17 [SutOl]. The prover has siguificautly chauged 
siuce the previous published descriptiou [Sch99a]. Major chauges iuclude the 
iutroductiou of literal selectiou, the additiou of uew geueric clause evaluatiou 
fuuctious, support for usiug the prover as a preprocessor or clause set uormalizer, 
proof output aud checkiug, aud a much improved automatic mode. Due to space 
coustraiuts, we will restrict this discussiou to the core fuuctiouality of the prover. 

The complete distributiou of E 0.61 is available ou the luteruet at [Sch99b]. 

2 Calcnlus 

We ouly iutroduce a few esseutial terms. Clauses are multi-sets of literals, usu- 
ally writteu as disjuuctious. Literals are either equatious (positive literals) or 
iuequatious (uegative literals) over terms, a uou-equatioual literal P{t\, . . . ,t„) 
is eucoded as P(ti, . . . , t„) ~ T for the special symbol T. 

The calculus SP [Sch00a,Sch00b] implemeuted by E is a variaut of the stau- 
dard superpositiou calculus with selectiou [BG94]. E implemeuts the geuerat- 
iug iufereuce rules Superposition (restricted paramodulatiou) , equality factor- 
ing (geueralized factoriug) aud equality resolution. All geueratiug iufereuces are 
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constrained by term ordering and literal selection as follows: If a clause has no 
selected literals, inferences are performed on maximal literals. If, on the other 
hand, a clause has at least one selected literal, inferences are restricted to se- 
lected literals. In this case, the clause is not used to paramodulate into other 
clauses (although selected literals are a target for being paramodulated into). In 
all cases, only the maximal sides of a literal are used in the inferences. Literal 
selection is arbitrary, with the restriction that at least one of the literals selected 
in a clause has to be negative. Surprisingly, many of the best selection schemes 
also use the strictly unnecessary selection of at least one positive literal. 

In addition to generating inferences, simplification plays a major role in E. We 
use the obvious clause simplifications (deletion of duplicate and trivial literals), 
as well as unconditional rewriting, subsumption, tautology deletion, and the 
simplify-reflect inference that resolves a negative literal against a unit equation. 
The latest addition is AC-redundancy elimination, a special technique for dealing 
with associative and/or commutative function symbols. As shown in [AHLOO] 
(and implemented in the Waldmeister prover), most consequences of the AC 
axioms are superfluous in Knuth-Bendix completion. This result carries over to 
full superposition by using the general redundancy notion described in [BG94]. 
In E, we recognize and delete the corresponding clauses. Additionally, we can 
delete all negative literals in which both terms are equal modulo the recognized 
AC theory from clauses. 

3 Proof Procedure 

E implements the saturation of the clause set using the variant of the given- 
clause algorithm that was introduced in DISCOUNT [DKS97]. The core idea is 
to split the set of all clauses into a subset P of processed clauses and a subset U 
of unprocessed clauses. P is maximally simplified (with respect to clauses in P), 
and all generating inferences between clauses in P have been performed. Clauses 
in U are not directly used for any operations. 

Each traversal of the main loop of the algorithm pick the best clause from U 
and simplifies it with P. If the clause is not redundant, it is used to back-simplify 
clauses in P and to generate new clauses. It is then added to P. New clauses are 
simplified once (to improve heuristic evaluation) and are added to U. 

Typically, U is several orders of magnitude bigger than P. Therefore, only 
cheap operations are performed on clauses in U. More expensive ones, like de- 
tection of semantic tautologies, or non-unit subsumption, are only applied if a 
clause is selected for processing. By concentrating on P, we achieve a high rate of 
inferences as well as a fairly good locality of references, i.e. only a small number 
of clauses are typically used for each loop traversal. 

4 Search Control 

E offers a very wide range of options for the control of the proof search. The three 
major choice points are the selection of the next clause to process, the potential 
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selection of literals within a clause, and the selection of the term ordering used 
to constrain the proof search. 



4.1 Clause Selection 

The order in which clauses are selected for processing is the most important 
choice point for any theorem prover based on the given-clause algorithm. In E, 
this order is determined by a clause selection heuristic. Such a heuristic sets up 
a variety of priority queues and a weighted round-robin scheme that determines 
from which queue the next clause is to be selected. Precedence within each 
queue is determined, in this order, by a priority function that can e.g. prefer all- 
negative clauses, ground clauses, or initial axioms, and by an evaluation function 
that is typically based on symbol counting. Evaluation functions are created by 
instantiating one of about 15 different generic function templates. Completeness 
of the prover can be guaranteed by careful selection of priority and evaluation 
functions, or by simple addition of a fair queue (e.g. a FIFO-queue). In practice, 
most successful heuristics in E combine two evaluation functions (one specializing 
on potential goals, i.e. all negative clauses, and one for the remaining clauses) 
based on refined symbol counting (which assigns a higher weight to maximal 
terms and literals), and a FIFO queue. 



4.2 Literal Selection 

As described in the calculus section above, inferences can be restricted to certain 
selected literals. E currently implements about 60 different strategies for the 
selection of literals, many of which are (more or less successful) experiments. 
Currently, these strategies are hard-coded in C. Since the necessary code changes 
to add a new strategy are quite minimal, there is no strong pressure to find a 
more abstract interface. 

The most successful strategies select negative ground literals before non- 
ground ones, and prefer literals with a large size difference between both terms. 
Moreover, it often is useful to refrain from selecting literals in clauses with a 
unique maximal literal or in range-restricted horn clauses. 



4.3 Term Orderings 

E uses two different ground-complete simplification orderings (which are lifted to 
orderings on literals and clauses): The Lexicographic Path Ordering (LPO) and 
the Knuth-Bendix- Ordering (KBO). Both are parameterized by a precedence 
on the function symbols, the KBO additionally requires a weight function for 
function symbols and variables. The precedence can either be specified explicitly 
on the command line, or one of several predefined schemes can be used. Similar 
schemes exist for selecting the symbol weights. 
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4.4 Automatic Mode 

The large flexibility in specifying search strategies for E makes the selection of 
an adequate set of options for any given problem fairly hard. We have therefore 
implemented an automatic mode that determines a search strategy based on 
the proof problem. The code realizing this automatic mode is generated from 
test results of the prover on the TPTP problem library [SSY94] as follows: We 
manually determine a partition of the set of test problems, induced by variety 
of features (presence of Unit, Horn or general clauses, equality content, average 
term size,. . . ) and order all heuristics by overall performance. A small program 
traverses the set of heuristics in descending order and assign to each class the 
first (i.e. the most general) heuristic that achieves optimal performance in this 
class. Output of this program is the C code implementing the automatic mode. 

5 Implementation 

As stated in the introduction, E is implemented in ANSI C. The most outstand- 
ing feature of the implementation is use of shared terms with shared rewrit- 
ing. Except for short-lived temporary copies and terms with special restrictions 
imposed by the calculus, no subterm is ever represented twice in the prover. 
Rewriting is done directly on the shared structure. 

Like almost all other high-performance saturating provers, E uses indexing 
techniques to speed up common operations. In particular, E uses perfect dis- 
crimination trees with age and size constraints to speed up most simplifying 
unit operations: Subsumption, forward-rewriting, and simplify-reflect. We use 
normal form dates and rewritability flags on the shared terms to avoid dupli- 
cation of effort in both forward- and backward-rewriting. As current hot spots 
in the code do not involve unification or, for most proof problems, non-unit- 
subsumption, we have not yet implemented indexing for generating inferences 
and non-unit-subsumption. 

Due to the shared term representation, the construction of a proof object 
is fairly hard, as each rewriting step may affect an arbitrary number of usu- 
ally unknown clauses. We have added a post-processor that reconstructs the 
term/clause relationship during rewriting. The resulting proof objects are not 
yet very detailed, but can be checked for correctness using a proof checker. We 
can use either Otter, SPASS, or E itself to verify the validity of each deduction 
in a proof derivation. 

6 Performance 

The table below shows the number of proofs (and models) found by E 0.61 
(prerelease version) on all clause normal form problems from TPTP 2.3.0. The 
prover was running in automatic mode on a SUN Ultra 60 Workstation at 300 
MHz, with time limits of 100, 300 and 2000 seconds and a memory limit of 192 
MB. 
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If we compare E with other state-of the art theorem provers, we find that 
E is particularly strong for Horn problems. It also is among the best general- 
purpose systems for unit-equality problems. For non-Horn problems without 
equality, however, the performance is below par. This may be due to the fact that 
currently neither analytic features nor special inference rules for non-equational 
literals are implemented. Finally, compared to e.g. SPASS, E is not very god at 
finding models. 



Time limit 


Size of 
class 


100 s 


300 s 


2000 s 


Problem class 


Proof 


Models 


Proof 


Models 


Proofs 


Models 


Total 


Unit, no equality 


11 


8 


3 


8 


3 


8 


3 


11 


Unit, equality 


447 


349 


3 


362 


3 


362 


3 


365 


Horn, no equality 


609 


521 


5 


548 


5 


551 


5 


556 


Horn, equality 


507 


357 


3 


389 


3 


397 


3 


400 


General, no eq. 


766 


301 


89 


320 


92 


327 


94 


421 


General, equality 


1218 


378 


3 


414 


3 


424 


3 


427 


Overall 


3558 


1914 


106 


2041 


109 


2069 


111 


2180 



7 Future Work 

While the current prover already shows a quite satisfactory level of performance, 
our highest priority is still the improvement of the base prover. At the calculus 
level, we are planning to integrate clause splitting (as in Vampire) to achieve a 
better performance for non-Horn problems, and to integrate literal splitting for 
equational literals to simulate Waldmeister’s multiple normal form strategy for 
unit problems. At the control level, we will try to move from the current feature- 
based classification of proof problems to the recognition of important algebraic 
substructures (e.g. groups, rings, set theory. . . ) to achieve a better adaption to 
the problem at hand. 

In the longer term, we are planning to integrate the saturating prover with 
an analytic top-down component and to work on issues of proof analysis and 
presentation. 
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Abstract. In this abstract we describe version 1.1 of the theorem prover 
Vampire. We give a general description and comment on Vampire’s orig- 
inal features and differences with the previously described version 0.0. 



From the very beginning, the main research principle of Vampire was efficiency. 
Vampire uses a large number of data structures for indexing terms and clauses. 
Efficiency is still the most distinctive feature of Vampire. Due to reimplemen- 
tation of some algorithms and data structures, Vampire 1.1 is on the average 
considerably more efficient than Vampire 0.0. 

However, the last year many efforts were invested in flexibility: several new 
inference and simplification rules were implemented, options for controlling the 
proof search process added, and new literal selection schemes designed. 

For the remaining time before IJCAR 2001, we are going to concentrate 
on adding more flexibility to Vampire, both for experienced and inexperienced 
users. 

1 General Description 

Vampire is a completely automatic saturation-based theorem prover for first- 
order logic with or without equality. 

Calculi. Two kinds of calculi are implemented: 

1. binary resolution with superposition and negative selection; 

2. positive and negative hyperresolution, but only for logic without equality. 

Saturation algorithm and splitting. Most of the existing first-order theorem 
provers implement either the OTTER-style saturation algorithm [6] or the DIS- 
COUNT-style algorithm [1] (see [8] for details). Vampire implements both al- 
gorithms and in addition an original algorithm based on the so-called limited 
resource strategy [8]. The DISCOUNT algorithm has been implemented recently. 
The main feature of this algorithm is that unused clauses are absolutely passive 
until they are selected for performing inferences, i.e. they are not eligible for any 
simplifications. 

Until recently, SPASS [13] was the only prover implementing a splitting rule 
on top of a saturation algorithm. To avoid high cost of implementing the standard 
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splitting, we have implemented splitting without backtracking [10]. If the search 
space contains a clause C V D, where the variables of C and D are disjoint, we 
replace this clause by two clauses CVp and D\/->p, where p is a new propositional 
symbol. There are several options controlling splitting in Vampire 1.1: 

1. Blocking and parallel splitting are obtained by different modifications of the 
literal selection function to the clauses containing new predicate symbols. 
These versions allow us to simulate, to some extent, sequential and parallel 
case analysis. Suppose we split CV D into C V p and D V ~^p. The blocking 
extension of the selection function will select -ip in H V -ip thus “blocking” 
inferences with the literals from D until ~'p is cut off by resolution. The 
ordering on literals is adjusted in such a way that p is less than any literal 
with a predicate from the original signature. Thus, unblocking D by resolving 
with -ip is postponed until the literals from C and literals introduced by 
resolving with them are all cut off. This roughly corresponds to the standard 
sequential analysis of cases. In the parallel extension of the selection function 
literal with input predicates are always selected before any literals with new 
predicates introduced upon splittings. 

2. New literals can be used as names of clauses. For example, if we split CV D 
into C \/ p and D V -■p, we then consider -ip as the name of C. The next 
time we encounter a clause C y D' such that the variables of C and D' are 
disjoint, we simply replace this clause by D' V -•p. 

3. A simplification rule called branch rewriting can be used. This rule is essen- 
tially simplification by nonunit equalities of the form s = t y P, where P 
only consists of the new literals. Such clause can be used to rewrite a clause 
C[s6] into C[t0] under the condition P C C[s9]. 

Literal selection. Several literal selection strategies are now available. For ex- 
ample, the user may choose selecting only the maximal literals, or first selecting 
the maximal literals and then trying to change it by negative selection if it gives 
(heuristically) better selection. We have also added an option to use inherited 
negative selection used in the prover E [11]: after paramodulating into a clause 
with a literal that was selected by negative selection, in the resulting clause 
we will necessarily select the literal obtained from -<A. 

Orderings. Only one kind of term orderings is implemented in Vampire 1.1: a 
nonrecursive version of the Knuth-Bendix ordering. For two ground terms s, t 
we have s t if either the weight of s is greater than the weight of t, or the 
weights are equal but s is lexicographically greater than t. The lexicographic 
comparison is done on the words obtained by enumerating the symbols of the 
terms visited in the left-to-right depth-first traversal. For nonground terms, the 
ordering is defined in a similar way. Apart from some other pleasant properties, 
this ordering enables efficient approximation of algorithms for solving ordering 
constraints. 
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2 Other Original Features 

Precompiled ordering constraints. In version 0.0 rewriting by a unit equality 
s = t was allowed only when this equality was ordered, i.e. s t. In the new 
version more simplifications by unit equalities can be done due to introduction 
of constrained rewrite rules. Now, if an equality s = t is not preordered, we can 
still rewrite a clause C[s9] into C[t9] provided that s9 >- t9. 

In general, the check s9 >- t9 can be expensive. To avoid this expensive check, 
we generate constraints which encode the sufficient and necessary conditions for 
s9 >- t9 to hold. For example, for the commutativity axiom x ■ y = y ■ x and the 
Knuth-Bendix ordering instead of checking x9 ■ y9 y9 ■ x9 we can check a 
simpler but equivalent condition x9 >- y9. For the ordering used in Vampire, the 
corresponding condition will be a simple lexicographic comparison of x and y. 

Commutativity optimization. Special treatment of commutative functors is im- 
plemented. Now, every subterm of a generated clause is normalized with respect 
to commutativity of certain functions. If / is commutative, the term f(ti,t 2 ) is 
replaced by f{t 2 , ti) provided that t 2 is lexicographically smaller than ti. Due to 
our choice of the term ordering, this normalization does not violate completeness 
since it can be interpreted as rewriting using the commutativity law with an or- 
dering constraint. Doing the normalization explicitly spares us the necessity of 
using a general algorithm for solving the constraint. 

Negative equality splitting. This rule may be applied on the preprocessing phase 
and allows us to simulate an algorithm used in Waldmeister [5] . If the term s is 
ground, the clause s yf f V C can be replaced by the following two clauses: p{s) 
and ~'p(t) V C, where p is a new predicate symbol. This allows us to process s and 
the rest of the clause separately. Apart from other things, this may lead to the 
following addition-instead-of- multiplication effect. Suppose that s can be rewrit- 
ten by paramodulation in m different ways resulting in the terms si, . . . , Sm, and 
rewriting of t produces n versions ti,. . . ,tn- Without negative equality splitting 
we would have to keep m ■ n clauses Si ^ tj V C,1 < i < m,l < j < n, while 
negative equality splitting allows us to keep only m + n clauses, namely p{si) 
and -•pitj) V C. 

Subsumption resolution. One of the simplification techniques that is heavily 
used in Vampire, and to which it owes a great part of its power, is subsumption 
resolution [3]. A clause -lA V Ci can be replaced by Ci if among the kept clauses 
there is a clause C 2 subsuming Ay C\. 

New algorithm for backward subsumption. One of the major improvements is a 
new indexing method for backward subsumption. Vampire 0.0 used an algorithm 
based on discrimination trees. This algorithm proved to be extremely slow on 
many problems. A new algorithm based on path indexing [12] and database joins 
was designed [9]. The first implementation of this algorithm was inefficient, but 
a couple of optimizations turned it into an extremely efficient one. The first key 
decision was to replace the set of (clause,literal,term) tuples in the leaves of the 
path index by a more suitable data structure called skip lists [7]. The second key 
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decision was to change the order of evaluating joins depending on the sizes of the 
sets. In addition, we added an optimization for handling symmetric predicates. 



3 Future Developments 

In the future we will develop Vampire using the principles mentioned in the 
introduction: efficiency and flexibility, with the emphasis on the latter. 

Efficiency. We will continue experiments with indexing techniques and new al- 
gorithms and datastructures for the most important problems. In particular, 
we will try to improve our indexing techniques for treating symbols with spe- 
cial properties, such as commutativity of functions and symmetry of predicates. 
Another expected change is a reimplementation of indexing for unification. 

Flexibility. So far Vampire was very difficult to use. First, many options were 
inherited from rather old versions and have become obsolete. The user’s manual 
was primitive. We are planning to make Vampire more user-friendly so it can be 
used for two kinds of applications: (i) interactive use by experienced users who 
can understand various proof-control options and use Vampire to prove hard 
theorems; (ii) as a built-in subsystem of automated proof-assistants, interactive 
provers, and verification systems. These will require enhancement of Vampire by 
various features, such as new selection strategies, new simplification orderings: 
ordinary Knuth-Bendix ordering and lexicographic path ordering (see e.g. [2]), 
built-in AC etc. We are going to provide interface to interactive provers and 
automatic proof-assistants, such as HOL and Isabelle. 

Preprocessing. Currently, preprocessing of clauses is part of Vampire. We are 
going to implement preprocessing as a separate program that analyzes input, 
simplifies it, and calls Vampire with suitable options. A new clausifier will be 
implemented. 

Other. We are going to reimplement memory management in Vampire since cur- 
rently there are situations when Vampire will request memory from the system 
behind the specified limit. We are planning to implement stratified resolution 

[4]. 

4 Availability 

Vampire is available free of charge. The authors will be glad to provide the 
newest versions of the system with necessary assistance to any interested party. 
The system can be run under Linux and Solaris. A temporarily unsupported port 
is also available for the Win32 platforms. The latest information about Vampire 
is available at http://www.cs.man.ac.uk/fmethods/vampire/. 
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Abstract. We describe the theorem prover DCTP, which is an imple- 
mentation of the disconnection tableau calculus, a confluent tableau 
method, in which free variables are treated in a non-rigid manner. In 
contrast to most other free- variable tableau variants, the system can also 
be used for model generation. We sketch the underlying calculus and its 
refinements, and present the results of an experimental evaluation. 



1 Introduction 

In this paper we present the theorem prover DCTP. It is based on the dis- 
connection tableau calculus [3], which is a clausal tableau calculus with some 
promising characteristics. The calculus is inherently cut-free, it provides a deci- 
sion procedure for a larger class of formulae than most other first-order calculi 
and, most importantly, it is proof confluent, so that it can be used for model 
generation. The extraction of a model from a saturated branch is one of the main 
motivations and advantages of the traditional semantic tableau approach. It is 
important to emphasize that this advantage is lost in contemporary free- variable 
tableau calculi like connection tableaux or model elimination [6] or certain con- 
fluent variants of tableaux [1] or hypertableaux [2], in which free variables are 
treated in a rigid manner. 

In the paper, we describe the underlying calculus and its relation to other 
methods like clause linking [4]. Furthermore, we expound how the basic proof 
system can significantly be improved by extending it with clause simplification 
and the generation of unit lemmas, both in a top-down and a bottom-up man- 
ner. This also permits the use of unit subsumption. Also, the main implemen- 
tation decisions of the system are briefly sketched. We conclude with reporting 
on results of an experimental evaluation, which is quite encouraging for a first 
prototype. 

2 The Disconnection Tableau Calculus 

Essentially, the disconnection tableau calculus can be viewed as an integration 
of Plaisted’s clause linking method [4] into a tableau control structure. The orig- 
inal clause linking method works by iteratively producing instances of the input 
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clauses, which are occasionally tested for unsatisfiability by a separate proposi- 
tional decision procedure. The use of a tableau as a control structure has two 
advantages. On the one hand, the tableau format restricts the number of clause 
linking steps that may be performed. On the other hand, the tableau method 
provides a propositional decision procedure for the produced clause instances, 
thus making a separate propositional decision procedure superfluous. For the 
description of the proof method, we use the standard terminology for clausal 
tableaux. The disconnection tableau calculus consists of a single complex infer- 
ence rule, the so-called linking rule. 

Linking rule. Given a tableau branch B containing two literals K and 
L in tableau clauses c and d, respectively, if there exists a unifier for the 
complement of K and a variable-renamed variant Lt of L, then successively 
expand the branch with the two clauses ca and dra as illustrated in Figure 1. 

In other terms, we perform a clause linking step and attach the coupled instan- 
tiated clauses at the end of the current tableau branch. Afterwards, the respec- 
tive connection cannot be used any more on the branches expanding B, which 
explains the naming ’’disconnection” tableau calculus for the proof method. Ad- 
ditionally, in order to be able to start the tableau construction, one must choose 
an arbitrary initial active path through all the input clauses, from which the 
initial connections can be selected. This initial active path has to be used as a 
common initial segment of all proper tableau branches considered later on. 




dra : Lra 

* 

Fig. 1. Illustration of a linking step. 



As branch closure condition we use the same notion as employed in the clause 
linking method. That is, a branch of a tableau is ground closed if it contains two 
literals K and L such that ATcr is the complement of La where ct is a substitution 
mapping all variables in the tableau to a new constant. Applied to the tableau 
in Figure 1, this means that after the linking step at least the middle branch is 
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ground closed, as indicated with an asterisk. The disconnection tableau calculus 
has the following properties. First of all, it is refutation complete for first-order 
clause logic. In order to find a proof, the linking rule must be applied in a 
fair manner. This is achieved by ensuring that any connection on a branch is 
eventually used on the branch. Second, the method provides a decision procedure 
for the Bernays-Schonfinkel class, i.e., for clause sets containing no function 
symbols of arity > 0, just like the clause linking method or hypertableau methods 
like Satchmo. If the set of connections on a branch is exhausted, the literals on 
the branch describe a model for the clause set. In this respect, the method is 
superior to most other free-variable tableau calculi like model elimination, in 
which the free variables are treated in a rigid manner. Furthermore, in contrast 
to hypertableau calculi, in the disconnection approach the instantiation of clauses 
is fully guided by connections and no hidden form of Smullyan’s 7 rule is needed. 



3 Refinements 

As usual in the development of theorem provers, implementing a simple calculus 
in its pure form will not result in a competitive system. In order to improve the 
performance of the system, we have integrated a number of refinements, which 
preserve completeness and increase the performance of the disconnection tableau 
calculus tremendously. 

Pruning of clause variants. In a standard linking step two new clauses c and d 
are attached to the current tableau. In many cases, however, one of the attached 
clauses, say c, is a variant of a clause already on the branch. In this case, the 
branch can be expanded with the clause d only. 

Pruning of redundant branches. Since connections on a branch have to be used 
in a fair manner, it may happen that a closed subtableau is produced in which 
the top literal L is irrelevant. In this case the brother literals of L need not be 
solved. 

Unit simplification. The special treatment of unit clauses is one of the most 
successful methods to increase the performance of theorem provers. This also 
holds for the disconnection tableau calculus. When a clause has to be attached to 
a tableau branch, we can remove all literals whose complements are subsumed by 
unit clauses contained in the clause set. In the course of the tableau construction, 
this leads to shorter and shorter tableau clauses. 

Unit lemma generation. In the tableau framework new unit clauses can be gener- 
ated in two entirely different ways, in a top-down and a bottom-up manner. Top- 
down unit lemmas naturally result from unit simplifications, when the length of 
the tableau clause to be attached is 1. In such a case, instead of attaching the 
respective literal to the end of the current branch it is simply attached to the 
root of the tableau and may now be used by all other literals in the tableau. The 
bottom-up generation of units is more delicate. Assume a subtableau T with root 
literal L is ground closed without referring to literals on the branch above L. 
This amounts to the generation of a new unit lemma, namely, the complement of 
La where cr identifies all variables in L — just taking the complement of L would 
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be unsound because of the modified branch closure condition. This feature is a 
special case of the so-called folding up procedure [5] . 

Unit subsumption. Once a large number of unit clauses are available, we can 
strongly profit from unit subsumption, i.e., tableau clauses are not attached 
when they are subsumed by a unit clause. As a matter of fact, subsumption 
deletion between non-unit clauses cannot be performed, since then no linking 
step could be applied at all. 

4 Implementation Issues 

The disconnection calculus employed by DCTP is proof confluent. This means 
we do not have to backtrack on inferences, i.e. we do not enumerate possible 
proof trees. We must, however backtrack on the branches of our proof tree, as 
even though all inferences performed on a closed branch remain fixed, we still 
need to solve all remaining open subgoals in the tableau. Backtracking within 
the proof tree, as opposed to a simple tail recursive iteration over open sub- 
goals, allows us to reconstruct the necessary environment, regarding for instance 
the existence and usability of the links on the branch, without explicitly stor- 
ing this information for each subgoal. The main loop of the system has two 
choice points, the selection of an open branch and the selection of a link on the 
branch for a linking step extension. Both of these selections are heuristically 
guided. The branch selection uses a depth-first strategy that selects the new 
branch depending on the instantiatedness of the leaf literals. The link selection 
favours links connecting short clauses. This way, a unit preference strategy is 
implemented. Additionally, the link selection considers the term complexity of 
the linked literal and thus guarantees the fairness of the proof procedure. The 
evaluation of the main loop is continued until either no open subgoals are left 
or the available links on a branch are exhausted. In order to improve the ef- 
ficiency when checking ground closedness, subsumption, or clause variants, we 
use indexing techniques for the storing of path literals, unit lemmas, and clause 
instances. All of the indexes use discrimination trees. In the current implemen- 
tation, we use no unification index, the generation of new links is performed by 
checking the new open subgoal against the list of potentially linked path lit- 
erals. The theorem prover DCTP is implemented in the Scheme dialect bigloo 
(http : / /kaolin. unice . fr/bigloo/bigloo .html). The system can be obtained 
from http : //wwwjessen. informatik.tu-muenchen.de/ letz/dctp .html . 

5 Future Extensions 

We have not yet implemented special inference rules for equality literals. Conse- 
quently, the performance of the prover on problems containing equality is rather 
poor. While a satisfactory solution for equality handling using the model elimi- 
nation calculus or classical tableaux has yet to be found, we have good reason to 
believe that the disconnection calculus is compatible with reasonable adaptations 
of orderings and demodulation, since the calculus does not use rigid variables 
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and unification is strictly local. We intend to implement a version of ordered 
paramodulation in our system. 

6 Evaluation 

We have evaluated the system DCTP on the TPTP library. Since currently 
no reasonable equality handling is integrated in the system, we give results for 
all clause problems and problems without equality predicates. We compared 
DCTP with two other provers, the Scheme reimplementation of the Setheo prover 
and the latest version of the E prover [7]. The time limit for all tests was 300 
seconds per problem on a Sun Ultra 60 with 300 MHz processors and 384 MB 
of main memory. Apart from highlighting the excellence of E, the results given 
in Table 1 confirm that model generation is one of the strengths of our new 
system. Considering the entire TPTP problem library, an entirely new prover of 
course cannot compete with refined state-of-the-art systems. Still, the reasonable 
performance on satisfiable or groundable formulae shows that a disconnection 
calculus prover has considerable potential. 



Table 1. Solutions found by various provers within 300 seconds for all TPTP clause 
problems (left) and non-equality clause problems (right). 
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Abstract. Proof assistants based on type theories, such as COQ and 
Lego, allow users to omit subterms on input that can be inferred au- 
tomatically. While those mechanisms are well known, ad-hoc algorithms 
are used to suppress subterms on output. As a result, terms might be 
printed identically although they differ in hidden parts. Such ambiguous 
representations may confuse users. Additionally, terms might be rejected 
by the type checker because the printer has erased too much type infor- 
mation. This paper addresses these problems by proposing effective era- 
sure methods that guarantee successful term reconstruction, similar to 
the ones developed for the compression of proof-terms in Proof-Carrying 
Code environments. Experiences with the implementation in Typelab 
proved them both efficient and practical. 



1 Implicit Syntax 

Type theories are powerful formal systems that capture both the notion of com- 
putation and deduction. Particularly the expressive theories, such as the Calcu- 
lus of Constructions (CC) [CH88] which is investigated in this paper, are used 
for the development of mathematical and algorithmic theories since proofs and 
specifications are representable in a very direct way using one uniform language. 

There is a price to pay for this expressiveness: abstractions have to be deco- 
rated with annotations, and type applications have to be written explicitly, be- 
cause type abstraction and type application are just special cases of A-abstraction 
and application. For example, to form a list one has to provide the element type 
as an additional argument to instantiate the polymorphic constructors cons and 
nil as in {cons IN 1 {nil IN))). Also, one has to annotate the abstraction of n in 
An:IN . n-l- 1 with its type IN although this type is determined by the abstraction 
body. These excessive annotations and applications make terms inherently ver- 
bose and thus hard to read and write. Without such explicit type information, 
decidability of type-checking may be lost [Miq01,Wel99]. 

Proof assistants are programs that have primarily been built to support hu- 
mans in constructing proofs represented as A-terms. They combine a (mostly) 
interactive proof-development system with a type-checker that can check those 

* This research has partly been supported by the “Deutsche Forschungsgemeinschaft” 
within the “Schwerpunktprogramm Deduktion” . 
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proofs. Moreover, most proof assistants support a syntax that is more convenient 
to use than the language of the underlying type theory. 

Besides some purely syntactic enhancements, such as the possibility to use 
operations in infix notation, most systems allow to leave out subterms that the 
system can infer as a remedy to the redundancy of explicit types. The explicit 
language (a variant of CC in our case) is complemented by an implicit language 
defined in terms of the underlying type theory through some inference mecha- 
nism. From the viewpoint of the user, the implicit language acts as an alterna- 
tive grammar for the type theory. It improves the expressiveness of the original 
theory as more terms have types, but no additional types are inhabited. This 
pragmatic approach to simplify the user interface of proof assistants is referred 
to as implicit syntax [Pol90]. Motivated by the examples above, the inference 
mechanisms that we focus on in this paper are argument synthesis, to avoid ex- 
plicit polymorphic instantiations, and (partial) term reconstruction, to suppress 
annotations on abstractions. For these, we use the term elaboration. The inverse 
process that removes redundant subterms is called erasure. 

Ad-hoc argument synthesis, implemented in the proof assistant COQ [Bar99], 
uses explicit placeholders to mark omitted subterms that should be inferred. 
The above example can be written in COQ as {cons 7 1 {cons ? 2 {nil ?))) using 
the placeholder symbol “?” . In addition, COQ supports the automatic insertion 
of placeholders. This is done by analyzing the types of global constants. Pa- 
rameters occurring free in the type of at least one of the succeeding parameters 
are assumed to be inferable (e.g., the first parameter of cons). We may write 
{cons 1 {cons 2 {nil ?)))^. To decide whether a term can be hidden on output or 
not, the same oversimplified^ analysis is used. This might lead to representations 
of terms without unique elaborations. Especially if two terms are printed iden- 
tically although they are not identical internally, resulting from different hidden 
arguments, users may get confused [MiqOl]. Even in cases where the automatic 
detection and erasure work correctly, the system tends to hide arguments one 
wants to see explicitly although they would be inferable. 

A finer control over implicit positions is possible through uniform argument 
synthesis as implemented in Lego [LP92]. The user can mark parameter po- 
sitions, using abstractions of a second ‘color’, at which arguments can be left 
implicit. Explicit arguments trigger the elaboration of arguments at preceding 
hidden positions^. To allow the specialization of polymorphic functions there is 
a syntactic facility to overwrite argument synthesis and to supply arguments 
‘forced’ at implicit positions by preceding them with “!” . At the internal repre- 
sentations of terms Lego uses annotations that correspond to forced marks and 
colored parameters of the user language. These annotations are used to decide 
which arguments to hide on output. Unfortunately, there are cases in which also 
Lego suppresses arguments that cannot be reconstructed. Furthermore, Lego 

^ Note that we still have to apply a placeholder to the empty list. 

^ A free occurrence of a parameter in another parameter type does not generally 
guarantee a successful elaboration. One reason for this is that argument synthesis is 
based on unification, which is not decidable in the higher-order case [Gol81]. 

® This rules out the inference of the type argument of nil. 
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sometimes does not hide arguments at marked positions, even when they are 
inferable. Both defects result from difficulties to define reduction for a language 
with forced arguments properly since uniqueness of elaboration cannot be de- 
cided locally [HT95]. 

This paper solves the problems and limitations caused by the usage of im- 
plicit syntax in current proof assistants by proposing a stronger elaboration 
algorithm and by complementing it with an erasure algorithm that guarantees 
unique elaboration. The elaboration algorithm is stronger than the mentioned 
ones since it allows the inference of implicit arguments at marked positions trig- 
gered also by the outer term context, while doing universal argument synthesis 
(e. g., our algorithm accepts {cons 1 nil)). In addition, the algorithm avoids the 
inconvenience of having to attach type information to all abstracted variables 
by doing (partial) term reconstruction. This allows the omission of type annota- 
tions^ that can be inferred by propagating type information (e. g., the annotation 
of n in (An .n + 1)). The erasure algorithm does a global analysis of terms and 
reconstructs forced marks (if necessary) instead of just propagating them. 

The rest of this paper is organized as follows. After introducing a bicolored 
variant of CC in the next section, we present the elaboration algorithm (Sect. 3). 
Guided by the strategy of this elaboration algorithm, we develop in Sect. 4 an 
erasure algorithm that is both effective in removing most subterms and practical 
as shown by experimental results (Sect. 5). Finally, we report on the adoption 
to more realistic languages and comment on related work (Sect. 6). 

2 Bicolored Calculus of Constructions 

The bicolored Calculus of Constructions (CC*^‘) to be used as explicitly typed 
language, is a variant of the Calculus of Construction (CC) [CH88]. Terms are 
built up from a set V of variables, sort constants s € {Prop, Type}, dependent 
function types {Ux:A . B), A-abstractions {\x:A.M) and function applications 
(M N) as in CC. To this we add abstractions of the form IIx\A . B and \x\A . B 
using the vertical bar as a color to mark implicit argument positions. If the color 
is irrelevant we use the symbol “||” to stand for either a colon or a bar and we 
abbreviate IIx:A . B hy A^ B \i x does not occur free in B. 

We denote the set of free variables of a term M by TV{M), the term resulting 
from substituting N for each free occurrence of x in M by M[x:= N], syntactic 
equality, the one-step /3-reduction and the /3-conversion relation by =, — >-^ and 
respectively. For a term M the weak head normal form (whnf) is denoted by 
M, the normal form by |M|, and the application head by head{M). As usual, 
we will consider terms up to a-conversions. 

The colors of abstractions have no essential meaning in the explicit language, 
only the distinction is important. So, implicit A-abstractions behave analogously 
to the corresponding explicit variant with respect to reduction. 

{{Xx\A.M) N) ^13 M[x:=N] 

Note that even if some annotations can be left implicit we demand all variables to 
be introduced explicitly, to avoid confusions caused by typos [GH98]. 
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The typing rules of CC are augmented by the following rules which differ only 
in the coloring from the related uncolored rules of CC. 

r\-A:si r,x:A\- M-.B B N:A 

r, x:A h B:s 2 r h IIx\A . B : s B \- M : Blx\A . B 

B\- Bx\A.B : S2 rh \x\A.M : Bx\A.B B {M N) : B[x:=N] 

The consistency of CC^^ under j3 follows immediately from that of CC. Note 
that while CC is still confluent with respect to /^ry-reduction [Geu92], this prop- 
erty is lost in the bicolored system. The term {Xx:A . {Xy\A .y) x) yields the 
critical pair {Xx:A . x , Xy\A . y) under /3?y-reduction. 



2.1 Unification Variables 

The elaboration algorithm, to be described below, maps partial terras^ (i-e., 
terms of the implicit language) to terms of CC^\ A partial term M' is translated 
to an open term M of the explicit language that contains unsolved unification 
variables. These are solved in turn by unification during the elaboration process. 

The explicit language extended by unification variables, which are syntac- 
tically distinguished from other variables by a leading “?”, is a variant of the 
one introduced by Strecker [SLvH98,Str99]. Unification variables are handled as 
constants with respect to iFV and reduction. For a term M, LCC{M) denotes the 
set of unification variables occurring in M . A unification variable depends on a 
context r and has a type A, as expressed by the suggestive notation B h ?n:A. 
Sort unification variables, B h In:*, are a special flavor of unification variables 
where instantiation is restricted to terms of type Prop or Type. 

An instantiation is a function, mapping a finite set of unification variables 
to terms. When instantiating a unification variable, it is replaced by the solu- 
tion term without renaming of bound variables. Instantiations are inductively 
extended to terms and contexts. 

There can be complex dependencies among unification variables in calculi 
with dependent types. Therefore, a context and a type are not invariantly as- 
signed to a unification variable In, but they are determined by the elaboration 
state under consideration. An elaboration state S consists of a finite set Bg of 
unification variables, a function ctxtg assigning a context to each In G T>£, and 
a function typcg assigning a type to each ?n G <Bg. Our elaboration algorithm 
will only produce well-typed elaboration states S, where the dependencies among 
unification variables In € <Pg are not cyclic® and the typing constraints imposed 
by ctxtg and typcg are ‘internally consistent’. 

For well-typed elaboration states reduction is confluent and strongly nor- 
malizing. Type inference and type checking are decidable and subject reduction 
holds [Str99]. 

® As a convention we prime metavariables that stand for terms of the implicit language, 
as M' , to distinguish them syntactically from those standing for fully explicit terms. 
® Note that unification variables do not have to be linearly ordered, as in the calculus 
of Munoz [MunOl], since this would restrict elaboration considerably. 
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2.2 Unification 

Unification tries to find an instantiation that, when applied to two terms, makes 
them equal. Equality is taken modulo convertibility, thus, the unification prob- 
lems we obtain will in general be higher-order. We use a ‘colored’ version of 
the unification algorithm defined by Strecker [Str99]. It essentially carries out 
first-order unification by structural comparison of weak head normal forms. Uni- 
fication judgments are of the form {£q\ F \- t\ « t2) ^ which express that 
the unification problem F \- t\ k, 12 ) can be solved by instantiation t, leaving 
open the unification variables of £\ . The resulting elaboration state Si is guar- 
anteed to be well typed if the elaboration state £0 is well typed. For presentation 
purposes we use the simplified notation F \- t\ ^ t 2 for unification judgments. 
We assume that all instantiations are immediately applied to all terms and we 
keep the elaboration states implicit. 

For us, the key properties are that unification is decidable for unification 
problems that produce only disagreement pairs of the simple rigid-rigid kind, 
and that solvable unification problems of this kind have most general unifiers 
(MGUs). Stronger decidable unification algorithms computing MGUs for the 
pattern fragment of higher-order unification [Mil91] depend on the vy-relation, 
which we have to rule out to keep CC*^‘ confluent. 



3 Elaboration 



The implicit user language is an extension of the explicit language by Gurry-style 
abstractions of both colors and applications with forced arguments. 

r ::= ... I AV||.r I iTVIl.r | (Tnir) nelN 

The value of n indicates implicit arguments preceding the forced one. The nota- 
tion (M \N) abbreviates (M 0!A^) and we write (M ||fV) to subsume all variants 
of applications. We assume a function @{M) that decomposes the term M in 
its application head and the list of its arguments in reverse order. For example, 
@(/ a 3!6 c) = {f , c :: 3!6 :: a :: •). 

3.1 Bidirectional Elaboration Algorithm 

We present the algorithm at an abstract level through five judgments making 
use of unification as introduced above. In- and output parameters are separated 
by “=>” and the flow of typing information is indicated by up and down arrows. 

Main Elaboration F h M' ^ M:N 

Synthesis Mode (SM) F \- M' ^ M t N 

Argument Generation (AG) F h L'; I ^ M:N 

Checking Mode (CM) F \- M' i N ^ M 

Coerce to Type (CT) F h Mi:N 4 O ^ M 2 
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Elaboration works in two distinct modes: SM, where type information is 
propagated upward from subexpressions, and CM, where information is propa- 
gated downward from enclosing expressions^. Unification variables are generated 
for implicit arguments through the AG function and are solved later on by the 
CT function which essentially calls unification. 

The main elaboration judgment is a partial function taking a context F and a 
partial term M' , producing the elaboration M of M' and its type N . Elaboration 
always starts with an empty elaboration state £1, (p£ = 0, in synthesis mode. 
As for unification, we keep instantiations and elaboration states implicit in the 
presentation of rules and judgments. 

In the rest of this section we define the four remaining judgments as a collec- 
tion of syntax-directed inference rules. The judgments are mutually dependent 
according to the above control flow graph. All rules assume a valid context T 
and check that context extensions maintain validity. 

Elaboration, Synthesis Mode: F h M' ^ M \ N 

The synthesizing (partial) elaboration function (Fig. 1) takes a context F and a 
partial term M' and produces the corresponding explicit term M and its type 
N . For example, the following judgment is derivable. 

F h (cons 1 nil) {cons IN 1 {nil IN)) f {List IN) 

The function implements essentially a colored version of the most natural 
type inference algorithm for CC known as the Constructive Engine [Hue89] and 
is used if nothing is known about the expected type of an expression. It makes 
use of the abbreviating judgment F h M' ^ M ^ N, which is identical to 
the SM judgment but assumes N to be in whnf. Note that the syntactic test 
in the side condition B ^ Type is strong enough to ensure the 7T-abstraction 
to be well typed even in our calculus extended with unification variables. The 
differences with respect to the Constructive Engine are the addition of the Curry- 
style abstraction rules, (A*t) and (IT*t), which generate new unification variables 
for the missing abstraction types, and a modified application rule (@t)- This 
application rule embodies a simple heuristic: always synthesize the type of the 
function, and then use the resulting information to switch to checking mode for 
the argument expression by calling the argument generation function. 

Argument Generation: F \- L'\l => M:N 

Argument generation essentially calculates the type of the application head L' 
and elaborates the list of arguments I under the expected parameter types in 
checking mode (Fig. 2). If the type of the head is not functional (e. g., a unifi- 
cation variable) elaboration fails. Unification variables are introduced at hidden 
positions unless overwritten by a forced argument. The result is the elaborated 
application M of type N. The example derivation from the last paragraph con- 
tains a subderivation of the following argument generation judgment. 

r h cons; nil ::!::•=> {cons IN 1 {nil IN)) : {List IN) 

^ The basic idea of bidirectional checking is not new and is for example used by the 
ML type- inference algorithm known as Algorithm At [LY98]. 
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Fig. 1. Elaboration, Synthesis Mode 




Fig. 2. Argument Generation 



Elaboration, Checking Mode: F h M' N ^ M 

The checking mode, described by the rules of Fig. 3, is used if the surrounding 
term context determines the type of the expression. The partial term M' is 
elaborated to M under the given expected type N, which is assumed to be 
in whnf. The resulting term M is guaranteed to be of type N. There is no 
side condition B ^ Type in the A-abstraction rules, since the expected type is 
known to be valid. Note further, that the expected type in rule (@4.) cannot be 
propagated down to elaborate the function part of an application since the result 
type of a function in CC depends on the actual arguments and not only on their 
number. Thus, to ensure soundness a final unification, essentially done by a call 
to the coerce to type function, is necessary. The argument nil of the one-element 
list (cons 1 nil) is elaborated in CM by a derivation of the following judgment. 

r h nil 4 (List IN) {nil IN) 

Coerce to Type: F h Mi'.N 4 O M 2 

The coerce to type function tries to convert the given term M\ of type N , with 
F h Mi'.N, to a related term M 2 of the expected type O (Fig. 4). The rule 
(Unif) just checks if the given and the expected type are unifiable and therefore, 
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(PropJ,) 



(a;) 



r h Prop\.Type Prop 

Ph A' ^ s 
Ph Ax A^ 

P,x:Ah M' XB ^ M 
P V- \x\\A' . M' i nx\\Ae . B ^ Xx\\A . M 



(Var4,) 



r X ^ x't A 
r\- x{AiB^ M 



r\-xiB 



M 



(@;) 



r\- {M' ||Ai') ^Utv 
r u-y io^Q 
rv- (M' ||Ar')iO^<3 

_ r h la:* 

r,x:A^ M' jB^M r,x:la^ B' B 

^ r^\x\\.M'Xnx\\A.B^\x\\A.M nx\\.B' is^ Bxp.a.B 



(i74) 



r h yl' ^ A t Si 
P, x:A \- B' S 2 ^ B 
r V- nx\\A' .B' is 2 ^ Bx\\A . B 



(UVi) 



r I- M' ^ M t Ai 
rh M-,N i In^O 
P\- M' i In 



Fig. 3. Elaboration, Checking Mode 



(Unif) 



P\- NxO 



P\- M:NiO^ M 



(NTI) 



P'^ {nx\A.B)iiO P'^ln-.A 
r h (M In) : B[x:=ln] iO^ P 
ri-M:77a;|yl.B40^P 



Fig. 4. Coerce to Type 



the given term M\ has not to be modified apart from resulting instantiations. 
If this fails and the given type is an implicit il-abstraction, a newly created 
unification variable is applied to M\ through nil-type inference using rule (NTI) 
and the result is recursively checked. In the other cases elaboration fails. While 
this strategy enables the inference of the type argument of nil, it rules out the 
possibility to collect unification constraints first and solve them later. Solving 
constraints immediately seems to be more efficient anyway [BNOO]. 

The elaboration of the argument nil under the expected type {List IM) is 
done using the rule (NTI), deriving the following judgment. 

P \- nil : IIT\Prop . {List T) {List IN) ^ {nil IN) 



3.2 Properties 

Proposition 1. Lf P \- Mi'.N f 0\ M 2 then P h M2'-02 with 0\ ~ O 2 . 

Proposition 2 (Sonndness). Lf P \- M' ^ M\N then P h M:N. 

Proof. By (mutual) induction on the derivation trees of the elaboration and 
argument generation judgments using Proposition 1, correctness of unification, 
subject reduction and correctness of types. 
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Proposition 3 (Partial Completeness). If F \- M:Ni and M' corresponds 
to the term M with forced applications at all implicit parameter positions then 
r h M' ^ M:N 2 and Ni ~ N 2 . 

Proof. Since the term M' has no missing subterms our algorithm never generates 
unification variables. Therefore, all derivations can be translated into derivations 
without CM judgments and unification reduces to conversion leading to deriva- 
tions identically to those of the (bicolored) Constructive Engine, which is known 
to be complete [Hue89]. 

Partial completeness essentially enables the user to give just as much ex- 
plicit type information as needed. This is necessary, because we cannot expect 
elaboration to be complete. 

Generally, the elaboration algorithm calculates only one of several possible 
elaborations. For example, assuming a constant id of type IIT\Prop .T ^ T the 
partial term {id id 1) has the two elaborations {id (IN — >■ IN) {id IN) 1) and 
{id {IIT\Prop . T — >■ r) zd IN 1) which are not convertible. Our algorithm would 
generate the second elaboration. 

4 Erasure 

The erasure algorithm is supposed to remove as many type annotations and 
arguments at implicit positions from a given term as possible, without losing 
any information. We propose an algorithm (Fig. 5) that mimics elaboration 
in an abstract way to predict its behavior. The erasure judgments, of the form 
r h M';C ^ M, compute for a unification variable free term M, UV{M) = 0, the 
erasure M' and the set C of variables that are erased in CM on first occurrence. It 
works again in one of two modes, M G {SM, CM}, corresponding roughly to those 
of the bidirectional elaboration algorithm. In contrast to elaboration, erasure 
works also in synthesis mode if the expected type on elaboration would be an 
open term, conservatively assuming it not to contain any structural information. 
Only if the expected type on elaboration is known to be unification variable free, 
erasure works in checking mode. 

Type annotations of A-abstractions are always left implicit if erasure is in 
checking mode, since those can be read off the fully explicit expected type with- 
out even generating a unification variable using rule (A*4) of Fig. 3. Other type 
annotations are only left implicit if the first reference to the corresponding ab- 
straction variable x is erased in checking mode (i.e., x G C). Both cases are 
shown by the following two derivable judgments. 

r,/:(IN-^]N)-^]NI- / (An.n);0 / (An:lN . n) P h Xn.n+l;{n} ^ An:lN.n-tl 

Arguments at marked position are left implicit if they are determined by the 
first depending argument type in elaboration order or by the expected type, if 
in CM (Fig. 6). Otherwise, erasure represents these arguments explicit as forced 
arguments. 
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(Prop) 

(A* 



r h Prop', 0 <i= Prop 

P,x:A\- M';C ^ M 



(Var) 



n I SM 

-T h a;; 0 X 



(Var* 



n I r 1 CM 

1 \- x; {x\ <i= X 



rh Aa:|| .M';C \x\\A . M 



Q€{x,n} 



(Q* 



X ec 

P,x:A h M'-C M 



r\- Qx\\ Qx\\A.M 



m 



(Q) 



P h L'',P',i',S',C ^ (M V);0 
P\- L'-,C^ {M N) 

Ph A'-,Ci ^ A 
P,x:A\- M'-,C2 ^ M x^C2 
C = Cl U (C2 \ TV{A)) 

PV- Qx\\A' Qx\\A.M 



Fig. 5. Bidirectional Erasure 



Argument erasure judgments are of the form F h M'; N;i; S;C M;n, 
where M is the application term to be erased and n is the number of additional 
arguments. It calculates the erasure M' and the type N of M, the number of 
preceding implicit argument positions i, a set S of positions which should be 
erased in SM since an implicit argument has to be inferred from the correspond- 
ing argument type and the set C as described above. Note, that arguments are 
identified here by the number of consecutive arguments. The erasure mode of an 
argument is computed from the set S as follows. 



mode(n, S) 



fSM if n e 5 
\CM else 



On erasing the term (cons IM 1 (nil IM)) a derivation of the following judgment 
is constructed. The resulting set 5 = {1} determines the second argument, 1, to 
be erased in SM while the last argument, (nil IN), can be erased in CM since 
nothing has to be inferred from its type. 



P\- cons-, nr\ Prop. T ^ (List T) (List T);1;{1};0 ^ (cons IN); 2 



The calculation of the determining information source, if any, for arguments 
at implicit parameter positions is done by the function dposM- 



Definition 4 (Determining Position). The function dposM for a mode M, a 
context r , a term T = n x\ A . B with \B\ = IIxi\\Ai, . . . ,xi\\Ai .C, 

C ^ F[y\\D . E, another term N with F h N'.A and n G IN is specified as follows. 



dposM (F, T, N, n) 



n — r i/ 3 r e IN . 0 < r < min(b n), x ^ FV(\Ai\) 
and X € SF(dom(F), Ar) 

< ■ if M = CM, n< I, X ^[Jf^iFV(\Ai\), T B[x ■.= N] 
and X G SF(dom(F), IIx„ + 1 II A„+i , . . . ,xi \\Ai . C) 
★ else 

\ 



The result of dposM is a number d G IN indicating the determining argument, 
or one of the symbols if the argument can be inferred from the expected 

type or cannot be inferred at all, respectively. The condition T 9 ^ B[x := N] 
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, , r\- M-.N rv- M'-,C ^ M 

(NoArG) ^ [M^(Mi Ms)] 

^ M-n 

, r'r M'-nx:A.B-,v,S;Ci^ M-,n+l r'rN'-,C 2 ^N 

(Visible) [m=mode{n,s)\ 

r h (M' N')-,B[x-.= N\,Q-S-,CiVJ{C 2 \TV{M)) ^ (M N)-n 

B'r M'-,nx\A.B-,i\S-,Ci M;n+ 1 Th A';C 2 ^ N 
n & Sy dposm{r,IIx\A.B,N,n) — ~k 

(Forced) [m=mode(n,<S)] 

rh (M' i\N'YB[x:=N]-0-,S-,CiVJ{C2\TV{M)) (M N)-,n 

r h M'-,nx\A.B\i-S-,C M;n + 1 

dposM{r, Bx\A . B, N,n) = d n^S 

(Implicit) ^ [di=-k] 

r^M'-B[x-.= N\,i + l-,S\J{d}-C^ (M N)-,n 



Fig. 6. Argument Erasure 



ensures the applicability of the rule (NTI) on elaboration by forcing the term M 
applied to the argument N to change its type. This is for example not the case 
for any term of type UT\Prop . T applied to its own type. 

The definition of dpos-u depends on the set ST of free variables of a term, 
solvable by unification with a fully explicit term. This set is defined as follows, 
assuming nothing is ever substituted to variables of the set Vo during unification. 



Definition 5 (Solvable and Dangerons Free Variables). The set of solv- 
able free variables, ST{Vo, M), of a term M relative to variables Vo is defined 
mutual dependent with the set of dangerous free variables, T>T{Vo, M), as follows. 

ST{Vo,M) =ST*{Vo,M,iD) VT{Vo,M)=VT*{Vo,M,tlt) 



with 



ST*(Vo,M,V) = { 



{x} 

ST* (Vo, A, V) 

UST*(Vo,N,Vu{x})\-nT*(Vo,A,V) 
ST* (Vo, Ml, V) 



U 5F*(Vo, M 2 , V) \ IV*(Vo, Ml, V) 



[0 



if M = X and a: ^ Vo U V 
if M = Qx\\A.N, 

Qe{x,n} 

if M = (Ml M 2 ), 
head(Mi) € Vo U V 
else 



and TT*(Vo,M,V) = TV(\M\)\ST*{Vo,M,V) 



The definition of the set ST mimics unification in that terms are kept es- 
sentially in normal form through stepwise reduction to whnf. The condition 
head(Mi) G Vo U V ensures that the heading of the term M remains unchanged 
under application of any substitution with the implication that all x G ST have 
residuals in every reduction of every substitution instance of M. 

To illustrate the last two definitions consider the erasure of (nil IN) in CM; the 
last argument of the above example, dpos cm{T, IIT\ Prop . (List T),IN, 0) yields 
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■ since ST {dom{r) , {List T)) = {T} assuming List G dom{r). This allows to 
derive the following argument erasure judgment using rule (Implicit) of Fig. 6. 

r V- nil, {List IN); 1; {■}; 0 {nil IN); 0 



4.1 Properties 

Proposition 6. If T V- M^-.N , T h M 2 :N , Mi ~ M 2 , W{M 2 ) = 0, T F x:A, 
X € ST{dom{r), Ml), r h?n:^ then F h Mi[a;:= ?n] « M 2 yields the most 
general instantiation i with In € dom{i). 

It is always possible to reconstruct the original term from the erasure using 
the elaboration algorithm of Section 3. 

Proposition 7 (Invertibility) . If F \- Mi'.Ni and F h M';C 1= Mi then 
F h M' M 2 'N 2 with Ml ~ M^ 2 - 

Invertibility essentially holds, since all unification variables generated by elab- 
oration for erased subterms are solved by the first typing constraints they par- 
ticipate in. It can be verified that all generated unification problems are such 
that one of the terms to be unified does not contain any unification variable 
and the other term does only contain solvable ones, which are guarantied to be 
instantiated (Proposition 6). 

Since type annotations on parameters can help to make expressions more 
readable serving as checked documentation, erasure prefers implicit arguments 
over implicit annotations. Consider an (explicit) polymorphic function / of 
type IIT\Prop.{T — >■ Prop) — >■ Prop. The algorithm above computes the 
erasure (/ (Ax:lNf . IN)) rather than (/ !1N (Ax. IN)) for the explicit term 
(/ IN (Ax:IN.lN)). Note further, that two explicit terms that are structurally 
equal, can still have different erasures and that different explicit terms can lead 
to the same erasure, but only if both terms occur in different term contexts. 

5 Experimental Results 

We have implemented and tested several variants of the elaboration and erasure 
algorithms discussed above as part of the proof assistant Typelab [vHLS 97] . For 
evaluation purposes we analyzed terms of different sizes up to 15,000 abstract 
syntax tree nodes. The terms were arbitrarily selected from definitions and proofs 
of the standard Typelab library. 

We found that compression factors are independent of the fully explicit term 
size. For that, we calculated the percentage reduction in the total size of all terms 
but separated the results for definitions from the results for proofs (Table 1). 
On average, our combined erasure algorithm almost reduces the representation 
in half the size while the compression is slightly more effective for proof terms. 

It has to be asked whether one could find an erasure algorithm which yields 
much smaller representations. To answer this question we determined the argu- 
ments at implicit positions that our combined erasure algorithm presented ex- 
plicitly. We found that on average, terms could only be reduced by another 1.1% 
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Table 1. Reduction in abstract syntax tree size 





Implicit Arguments 


Implicit Annotations 


Combination 


Proofs 


22.1% 


32.3% 


49,4% 


Definitions 


24.7% 


26.3% 


39.5% 




Fig. 7. Effect of implicit syntax on the full turnaround time 



through blindly erasing all those arguments while only 20% of them had enough 
information to be reconstructed. Erasing in addition all remaining type annota- 
tions from abstractions reduced the representation by another 7,6%, but none 
of those implicit terms could be reconstructed. We conclude that our combined 
erasure algorithm removes the vast majority of redundant subterms, respecting 
the given color information, thus leaving little room for further improvement. 

To analyze the performance benefits gained using implicit syntax we mea- 
sured for all terms the times® needed for a full turnaround including parsing, 
elaboration, erasure and the final printing of the implicit representation. Since 
we found all timings to be linear in the size of the fully explicit term, we averaged 
the results again. Fig. 7 shows the results for the fully explicit representation 
compared with our combined implicit representation. The pies are sized by area, 
which corresponds to the absolute representation size. We can conclude that, 
in practice, the additional costs produced by the erasure algorithm are smaller 
than the savings gained from dealing with reduced representations. 



6 Disscussion 

We have presented algorithms that improve the usability of implicit syntax for 
proof assistants. Our inference algorithm is stronger than the one of COQ or 
Lego, since it allows to omit more subterms. Furthermore, our erasure algo- 
rithm generates only implicit representations that allow the reconstruction of 
the original terms, in contrast to the ad-hoc erasure algorithms implemented in 
COQ and Lego. The experimental results presented in the previous section pro- 
vide evidence that our algorithms, while still being efficient, save considerable 
bandwidth between the user and the system. 

To implement the algorithms of this paper for the assistant Typelab we had 
to consider additional aspects. Deciding when to expand notational definitions 
is subtle for unification algorithms. Stepwise expansion, as done by most proof 
assistants, may return a unifier which is not most general and hence renders 

® We found similar results about the space requirements. 
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unification incomplete even for the first-order case [PS99] and thus would limit 
our elaboration algorithm. To adapt the erasure algorithm to the language of 
Typelab with recursive definitions, the set SiF had to be restricted since appli- 
cations with recursively defined heads are potentially ‘instable’ with respect to 
substitution at recursive parameter positions. 

The local erasure algorithm in this paper hides only arguments that can be 
elaborated by local methods [PT98], while our elaboration algorithm allows the 
global distribution of unification variables. Consider the polymorphic operation 
append on lists of type IIT\Prop . {List T) — >• {List T) — >• {List T) and a list I of 
type {List IN). The implicit term {append nil 1) is elaborated into the explicit 
term {append IN {nil IN) 1), but erasure would produce the wordy representa- 
tion {append {nil !IN) 1). We have also implemented an erasure algorithm that 
does not force elaboration to solve implicit arguments on the first opportunity 
completely. It works with an additional erasure mode, CM*, where the expected 
type is allowed to be an open term. This considerably more complex algorithm® 
computes the optimal erasure {append nil 1) for the explicit term above. 

One drawback of implicit syntax is the enlarged trusted code-base of the 
proof-checker. Fortunately, for sensible applications internal terms can always 
be rechecked by a small trusted or even verified checker for the base calculus. 



6.1 Related Work 

Berghofer and Nipkow describe a dynamic compression algorithm for proof terms 
in Isabelle [BNOO]. Their algorithm searches for optimal representations by 
essentially doing elaboration on erasure and seems not to be efficient enough for 
interactive usage. 

The problem of redundancy has been addressed also by Necula and Lee 
[NL98] in the context of Proof-Carrying Code systems. They analyze the combi- 
nation of ad-hoc argument synthesis with implicit type annotations for canonical 
first-order proof objects represented as fully applied LF terms in long /3?7-normal 
form, given a fully explicit expected type. This special setting enables the pre- 
computing of large parts of the erasure for constants. 

While a language generated by implicit syntax is natural to humans, it seems 
rather difficult to give it a direct foundation. Hagiya and Toda [HT95] have im- 
plemented an implicit version of CC directly using a typed version of /3-reduction 
defined on the implicit language. Several complicated syntactic restrictions have 
to be imposed to ensure decidability of type inference and to avoid dynamic type 
checking during reduction. On the theoretical side, Miquel defined a Curry-style 
version of CC [MiqOl]. Although the metatheory of this calculus is not yet fully 
developed, it is strongly conjectured that type checking is undecidable. For that 
reason, this calculus seems to be a poor basis for proof assistants. 



Acknowledgments I thank Martin Strecker for developing large parts of the 
Typelab system. 

® Details are subject of current research. 
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Abstract. In this paper, we present a syntax-directed termination and 
reduction checker for higher-order logic programs. The reduction checker 
verihes parametric higher-order subterm orderings describing relations 
between input and output of well-moded predicates. These reduction con- 
straints are exploited during termination checking to infer that a specified 
termination order holds. To reason about parametric higher-order sub- 
term orderings, we introduce a deductive system as a logical foundation 
for proving termination. This allows the study of proof-theoretical prop- 
erties, such as consistency, local soundness and completeness and decid- 
ability. We concentrate here on proving consistency of the presented in- 
ference system. The termination and reduction checker are implemented 
as part of the Twelf system and enable us to verify proofs by complete 
induction. 



1 Introduction 

One of the central problems in verifying specifications and checking proofs about 
them is the need to prove termination. Several automated methods to prove 
termination have been developed for first-order functional and logic programs 
in the past years (for example [15,1]). One typical approach is to transform the 
program into a term rewriting system (TRS) such that the termination property 
is preserved. A set of inequalities is generated and the TRS is terminating if 
there exists no infinite chain of inequalities. This is usually done by synthesizing 
a suitable measure for terms. To show termination in higher-order simply-typed 
term rewriting systems (HTRS) mainly two methods have been developed (for a 
survey see [13]): the first approach relies on strict functionals by van de Pol [12], 
and the second one is a generalization of recursive path orderings to the higher 
order case by Jouannaud and Rubio [4]. 

In this paper, we present a syntax-directed method for proving termination 
of higher-order logic programs. First, the reduction checker verifies properties 
relating input and output of higher-order predicates. Using a deductive system 
to reason about reduction constraints, the termination checker then proves that 
the inputs of the recursive call are smaller than the inputs of the original call 
with respect to higher-order subterm orderings. Our method has been developed 
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for the higher-order logic programming language Twelf [8] which is based on 
the logical framework LF [3]. Although Twelf encompasses pure Prolog, it has 
been designed as a meta-language for the specification of deductive systems and 
proofs about them. In addition to Prolog it allows hypothetical and parametric 
subgoals. As structural properties play an important role in this setting, higher- 
order subterm orderings have been proven to be very powerful (see Section 5). 

The principal contributions of this paper are two-fold: 1) We present a logical 
foundation for proving termination which is of interest in proving termination 
of first-order and higher-order programs. The logical perspective on reasoning 
about orders allows the study of proof-theoretical properties, such as consistency, 
local soundness and completeness and decidability. In this paper, we concentrate 
on proving consistency of the presented reasoning system by showing admissi- 
bility of cut. This implies soundness and completeness of the reasoning system. 
2) We describe a practical syntax-directed system for proving termination of 
higher-order logic programs. Unlike most other approaches, we are interested 
in checking a given order for a program and not in synthesizing an order for a 
program. The advantage is that checking whether a given order holds is more 
efficient than synthesizing orders. In the case of failure, we can provide detailed 
error messages. These help the user to revise the program or to refine the speci- 
fied order. The termination checker is implemented as part of the Twelf system 
and has been used successfully on examples from compiler verification (soundness 
and completeness proofs for stack semantics and continuation-based semantics), 
cut-elimination and normalization proofs for intuitionistic and classical logic, 
soundness and completeness proofs for the Kolmogorov translation of classical 
into intuitionistic logic (and vice versa). 

The paper is organized as follows: In Section 2 we give a representative Twelf 
program taken from the domain of compiler verification. Using this example we 
illustrate the basic idea of the termination checker. We review the background 
(see Section 3) In Section 4 we outline the deductive system for reasoning about 
orders and prove consistency of the system. Finally, in Section 5 we discuss 
related work, summarize the results and outline future work. 

2 Motivating Example 

Our work on termination is motivated by induction theorem proving in the logi- 
cal framework and its current limitations to handle proofs by complete induction. 
In this section, we consider a typical example from compiler verification [2] to 
illustrate our approach. 

Compilation is the automatic transformation of a program written in a source 
language to a program in a target language. Typically, there are several stages of 
compilation. Starting with a high-level language, the computational description 
is refined in each step into low-level machine language. To prove correctness of 
a compiler, we need to show the correspondence between the source and target 
language. In this example, we consider Mini-ML as the source language, and the 
language of the abstract machine as the target language. We only consider a 
small subset of a programming language in this paper. 
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Mini-ML Syntax Abstract Machine Syntax 

expressions e ::= eie2|lam x.e instructions I ::= ret u|ev e|appi v e|app2 V1V2 

values V ::= a;|Lam x.e stack S nil IS; Aw . 7 

The Mini-ML language consists of lambda-abstraction and application. To 
evaluate an application Ci 62, we need to evaluate ei to some value Lam x.e', 
62 to some value V2 and [u2/a^]e' to the final value of the application. Note, the 
order of evaluation of these premises is left unspecified. The abstract machine 
has a more refined computation model which is reflected in the instruction set. 
We not only have instructions operating on expressions and values, but also 
intermediate mixed instructions such as appi v\ 62 and app2 vi V2- Computation 
in an abstract machine can be represented as a sequence of states. Each state 
is characterized by a stack S representing the continuation and an instruction 
I and written as S'#/. In contrast to the big-step semantics for Mini-ML, the 
small-step transition semantics precisely specifies that an application is evaluated 
from left to right. 




A computation sequence 

S#(ev 6162) (S; Au.appi v e2)#(ev ei) ## nil #(ret w) 

is represented in Twelf as (t_app 0 Dl) where t_app represents the first step 
of computation S#(ev 6162) (S; Au.appi v e2)#(ev Ci) while Dl describes 

the tail of the computation (S; Au.appi v e2)#(ev Ci) ## nil #(ret w). We will 
sometimes mix multi-step transitions ## with single step transitions 1 — > with 
the obvious meaning. 

An evaluation tree in the big step semantics 
7^2 'Ps 

62 V 2 \v2jx\e-i ^ V 
6162 ^ V 



Vi 

ei Lam x.e'i 



ev-app 
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is implemented as (ev_app PI P2 P3). The leaves of the evaluation tree are 
formed by applications of the evJam axiom which is implemented as a constant 
ev_lam in Twelf. 

To show that the compiler works correctly, we need to show soundness and 
completeness of the two semantics. We will concentrate on the first property. 
To prove soundness we show the following: if we start in an arbitrary state 
S'#(ev e) with a computation S^{ev e) i — > nil #(ret w) then there exists an 
intermediate state S'# (ret v) such that e u in the Mini-ML semantics and 
S#(ret v) 1-^ nil #(ret w). 

Theorem 1 (Soundness). 

For all computation sequences T> : S=ff{ev e) i— ^ nil if{ret w) there exists an 
evaluation tree V : e ^ v and a tail computation V : S=ff{ret v) i— ^ nil =ff{ret w) 
such that T>' is smaller than T>. 

The proof follows by complete induction on T>. We consider each computation 
sequence T> in the small step semantics and translate it into an evaluation tree 
V in the Mini-ML semantics and some tail computation V which is a sub- 
derivation of the original computation T>. This translation can be described by a 
meta-predicate sound which takes a computation sequence as input and returns 
an evaluation tree and a tail computation sequence. 

As a computation sequence can either start with t_lam or t_app transition, 
we need to consider two cases. If the computation sequence starts with a t_lam 
transition (t_lam @ D) then there exists an evaluation of lam x.e to Lam x.e 
by the ev_lam rule and a tail computation D. The interesting case is when the 
computation sequence starts with an t_app transition (t_app @ Dl). 

S'#(ev Cl 62) (5’. Au.appi v e2)#(ev ei) 1-^ nil #(ret w) 

' V " 

Dl 

We recursively apply the translation to Dl and obtain PI which represents an 
evaluation starting in ei ^ v\ and (S'; Aw.appi v e2)#(ret ui) 1 — > nil #(ret w) 
as the tail computation sequence T>' . By inversion using the t-ret and t-appl 
transition rules, we unfold T>' and obtain the following tail computation sequence 

(S; Au.appi v C2)#(ret ui) S#(appi 11162) 

(S; Au.app2 uiu)#(ev 62) 1-^ nil #(ret uS) 

D2 

which is represented as (t_ret 0 t_appl @ D2). By applying the translation 
again to D2, we obtain an evaluation tree for 62 ^ V2 described by P2 and 
some computation sequence T>" : (S; Au.app2 uiu)#(ret V2) ' — > nil #(ret w). 
By inversion using rules t-ret and t-app 2 , we know that the value v\ represents 
some function Lam x.e' and T)" can be unfolded to obtain the tail computation 

(S; Au.app2 (Lam a;.e')u)#(ret V2) S#(app2 (Lam a:.e')u2) 

S#(ev [v2lx]e') 1-^ nil #(ret w) 

'• „ ^ 

D3 

which is represented as (t_ret 0 t_app2 0 D3). Now we apply the translation 
for a final time to D3 and obtain an evaluation tree P3 starting in [u2/a;]e' ^ v 
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and some tail computation S'^(ret v) i — > nil #(ret w) which we refer to as 
D 4 . The final results of translating a computation sequence (t_app 0 Dl) are 
the following: The first result is an evaluation tree for eiC2 ^ v which can be 
constructed by using the ev^app rule and the premises ci ^ (Lam x.e'), 62 ^ V2 
and \v2lx\e' ^ V. This step is represented in Twelf by (ev_app PI P 2 P 3 ). As 
a second result, we return the tail computation sequence D 4 . 

The following Twelf program implements the described translation. Through- 
out this example, we reverse the function arrows writing A2 <~ Ai, instead of 
Ai -> A2 following logic programming notation. Since -> is right associative, <- 
is left associative. A more detailed discussion of this example is given in [ 7 ]. 
sound : S # (ev E) =>* nil # (ret W) -> 

eval E V -> S # (ret V) =>* nil # (ret W) -> type. 

’/.mode sound +D -P -D’ . 
s_lam : sound (t_lam @ D ) ev_lam D. 
s_app : sound (t_app @ Dl) (ev_app P3 P2 PI) D4 
<- sound Dl PI (t_ret @ t_appl @ D2) 

<- sound D2 P2 (t_ret @ t_app2 @ D3) 

<- sound D3 P3 D4. 

First the type of the meta-predicate sound is defined. It has three arguments: 
the computation S^{ev E) 1 — >■ nil #(ret W) which is described as S # (ev E) 
=>* nil # (ret W), the evaluation e w which is represented as eval E V and 
the tail computation sequence S'# (ret E) 1 — ^ nil #(ret W) which is defined as 
S # (ret V) =>* nil # (ret W). 

The mode declaration ’/.mode sound +D -P -D ’ specifies inputs and outputs 
of the defined predicate. When executed this program translates computations 
on the abstract machine into Mini-ML evaluations. Dependent types underly- 
ing this implementation guarantee that only valid computation sequences and 
evaluations are generated. The mode checker [ 11 ] verifies that all inputs are 
known when the predicate is called and all output arguments are known af- 
ter successful execution of the predicate. To check that this program actually 
constitutes a proof, meta-theoretic properties such as coverage and termination 
need to be established. Termination guarantees that the input of each recur- 
sive call (induction hypothesis) is smaller than the input of the original call 
(induction conclusion). For termination checking the program needs to be well- 
moded. In addition, the user specifies which input arguments to consider and 
in which order they diminish. In the given example, we specify that the pred- 
icate sound should terminate in the first argument by ’/.terminates D (sound 
D P D ’ ) . For reduction checking we specify an explicit order relation between 
input and output elements. In the example we say ’/.reduces D’ < D (sound 
D E D’). In general, we allow atomic, lexicographic ({Argi, Arg2}) or simul- 
taneous {[Argi, Arg2]) subterm orderings. To show that a given program sat- 
isfies a given reduction constraint pattern, we proceed for each clause in two 
stages: First we extract a set A of reduction constraints from the recursive calls 
which can be assumed and the reduction constraint P of the whole clause which 
needs to be satisfied. Second, we prove that the set A implies the reduction 
constraint P. For proving termination of a given program, we also proceed in 
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two stages: For each clause, and for each recursive call we first extract a set A 
of reduction constraints which are valid and a termination constraint P which 
characterizes the relation between the recursive call and the original call. Sec- 
ond, we prove that the set A implies the termination constraint P. For example, 
to show that the predicate sound terminates, we show the following properties: 

Reduction: 7oreduces D’ < D (sound D P D’) 

if D4 ^ D3, (t_ret @ t_app2 @ D3) ^ D2 and (t_ret @ t_appl @ D2) ^ D1 then 
D4 ^ (t_app 0 Dl). 

Termination: 7oterminates D (sound D P D’) 

1. Dl ^ (app @ Dl) 

2. if (ret @ appl @ D2) ^ Dl then D2 ^ (app ® Dl) 

3. if (ret @ app2 @ D3) ^ D2 and (ret @ appl @ D2) ^ Dl then D3 ^ (app @ Dl). 

We use ^ to represent the subterm order relation. In general we might have 
nested clauses which need to be checked recursively. Moreover, we generate para- 
metric reduction constraints for parametric sub-clauses. In Section 5 we give 
another example for checking termination and reduction. A more detailed expla- 
nation for extracting the termination and reduction properties can be found in 
[9] . In the remainder of the paper we will briefly explain the background theory 
and then discuss a deductive system for reasoning about structural orderings. 

3 Background 



The higher-order logic programming language we are working with is based on 
the logical framework LF [3]. The meta-language of LF is the Ai7-calculus. It 
is a three-level hierarchical calculus for objects, families, and kinds. Families are 
classified by kinds, and objects are classified by types, that is, families of kind 
type. 

Kinds K := type |7Ia; : A.K Signatures E := -\E,h : K\E,c : A 

Types A ■.= hMi . . . Mn\IIx \ A\.A 2 Context P ■.= \p,x \ A 
Objects M := c\x\Xx : A.M\MiM 2 

We will use h for type family constants, c for object constants, and x for 
variables. Constants are introduced through a signature. II x : A 1 .A 2 denotes 
the dependent function type or dependent product: the type A 2 may depend on 
an object x of type A\. Whenever x does not occur free in A 2 we may abbreviate 
IIx : A 1 .A 2 as Ai — >■ A 2 . Below we assume a fixed signature S. The types of 
free variables in a term M are provided by a context P. The equivalence = is 
equality modulo /^ry-conversion. We will rely on the fact that canonical (i.e. long 
/^ry-normal) forms of LF object are computable and that equivalent LF objects 
have the same canonical form up to a-conversion. We assume that constants and 
variables are declared at most once in a signature and context, respectively. As 
usual we apply tacit renaming of bound variables to maintain this assumption 
and to guarantee capture-avoiding substitutions. 
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To illustrate the use of basic notation, we consider the representation of 
the abstract machine which was introduced in the last section. The operations 
application and lambda abstraction can be represented as canonical LF objects 
of type exp. Values, continuations, instructions and states are defined in a similar 
fashion. The evaluation derivation e u is represented by the judgement eval : 
exp -> val -> type, in Twelf. Similarly, we can encode the one-step transition 
relation and the multi-step transition relation as a judgements in Twelf. 



exp : 


type. 


eval : 


exp -> val -> type . 


lam: 


(val -> exp) -> exp. 


ev_lam : 


eval (lam E) (lam* E) 


app: 


exp -> exp -> exp. 


ev_app : 


eval (app El E2) V 
<- eval El (lam* El’) 


val : 


type. 




<- eval E2 V2 


lam* : 


(val -> exp) -> val. 




<- eval (El’ V2) V. 



The capitalized identifiers that occur free in each declaration are implicitly 
TT-quantified. The appropriate type is deduced from the context during type 
reconstruction. The fully explicit form of the first declaration would be ev_lam: 
n E: val -> exp. eval (lam E) (lam* E) . 



4 A Logical Approach to Termination 

4.1 Reasoning about Higher-Order Subterm Orderings 

In Section 2 we sketched the analysis of higher-order logic programs for termina- 
tion and reduction properties. Termination and reduction analysis is separated 
from reasoning about higher-order subterm relations. The analysis collects valid 
reduction properties as assumptions and states the ordering which needs to be 
satisfied under the assumptions. In this section we develop a formal inference 
system to check whether a set of valid reduction constraints implies an ordering 
constraint. For now, we consider only first-order subterm reasoning. An ordering 
constraint is either the ^ subterm relation, the -< subterm relation or structural 
equivalence relation =. A context Z\ is a set of ordering constraints. 

Context A :=-|Z\,P 

Ordering constraints P := Argi -< Arg2\Argi A Arg2\Argi = Arg2 
Arg Arg := M\{Argi, Arg2}\[Argi, Arg2] 



The reasoning system should exhibit a minimal set of desired properties such 
as transitivity reasoning, congruence closure for structural equality reasoning, 
and reasoning about A-terms. The system for first-order subterm reasoning is 
given in Figure 1 . It is similar to the sequent calculus formulation with right and 
left rules for each ordering relation. A is defined in terms of ^ and =. If the rule 
L< has no premises, i.e., is a constant c with no arguments, the hypothesis 
is contradictory and the conclusion A,M^c — ^ P is trivially true. Reasoning 
about structural orderings is inherently different from the usual reasoning with 
equality and inequality. Usually when reasoning about equalities/inequalities, we 
reason about the value of a term. For example, the value of hMi . . . Mn can be 
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Fig. 1. First-order Subterm Relations (^, 



equal to the value of gN\ . . . Nk where h and g denote different function symbols. 
When reasoning about subterms, we are only interested in the syntactic structure 
of a term. Therefore, a term hMi . . . can never be structurally equivalent to 
gNi . . . Nk, ii g ^ h. If hM\ . . . Mn = gN\ . . . Nk occurs in our assumptions, we 
can infer anything (L= 2 ). 

This system is already expressive enough to prove termination of the trans- 
lation of small-step semantics into big-step Mini-ML semantics which is imple- 
mented by the sound predicate (see p. 404). One of the claims we need to prove 
during termination checking is the following: 

(ret 0 appl 0 D2) ^ D1 — > D2 ^ (app 0 Dl) 

The proof written in a bottom-up linear notation is as follows: 

2. (ret 0 appl 0 D2) ^ Dl — > (ret 0 appl 0 D2) ^ Dl id 

(ret 0 appl 0 D2) -< Dl — > D2 = D2 refl 

(ret 0 appl 0 D2) -< Dl — ^ D2 ^ D2 i ?^2 

(ret 0 appl 0 D2) -< Dl — > D2 ^ (appl 0 D2) i ?^2 

(ret 0 appl 0 D2) -< Dl — ^ D2 ^ (appl 0 D2) 

1. (ret 0 appl 0 D2) -< Dl — > D2 -< (ret 0 appl 0 D2) i ?^2 

(ret 0 appl 0 D2) ^ Dl — ^ D2 ^ Dl using 1,2 

(ret 0 appl 0 D2) ^ Dl — ^ D2 ^ Dl 

(ret 0 appl 0 D2) ^ Dl — D2 ^ (app 0 Dl) i ?^2 





Termination and Reduction Checking for Higher-Order Logic Programs 409 



We can extend the system with rules for lexicographic orderings by defining 
left and right rules (see Figure 2). 0\ and Oi are considered to be lexicographi- 
cally smaller than and O'^ if either 0\ is smaller than or 0\ is structurally 
equivalent to 0\ and Oi is smaller than O'^- This disjunctive choice is reflected in 
the two rules RLex^i and RLex< 2 - If we assume 0\ and O 2 to be lexicograph- 
ically smaller than and O 2 , then we need to be able to prove some ordering 
P under the assumption 0\ is smaller than 0\ and under the assumptions Oi 
is structurally equivalent to 0{ and O 2 is smaller than O 2 (see LLex^). The 
rules for -< and = are straightforward. Similarly, we can define extensions for 
simultaneous orderings. Although we do not pursue other more complex struc- 
tural orderings for now, in general this approach can be also applied to define 
extensions for simplification orderings, multi-set orderings or recursive path or- 
derings. In this paper, we focus on extending the system to higher-order subterm 
relations. 

In the setting of a dependently typed calculus, we face two challenges: First, 
we need to reason about orders involving higher-order terms. Second, we might 
synthesize parametric order relations due to parametric subgoals. When con- 
sidering higher-order terms, we need to find an appropriate interpretation for 
lambda-terms. This problem is illustrated by the following example. Assume the 
constructor lam is defined as lam: (exp -> exp) -> exp. We want to show 
that if a is a subterm of lam \x.Ex where a is a parameter. In the informal 
proof we might count the number of constructors and consider E a an instance 
of Xx.Ex. Therefore we consider a term M a subterm of Xx.N x if there exists 
a parameter instantiation a for x s.t. M is smaller than [a/x]N. We will use the 
convention that a will represent a new parameter, while a stands for an already 
defined parameter. To adopt a logical point of view, the A-term on the left of a 
subterm relation can be interpreted as universally quantified and the A-term on 
the right as existentially quantified. 



A — > Oi ^ Oi' 

/I ^ {O1.O2} ^ {Oi'.Os'l 

zl ^ {O1.O2} ^ {0l',02'} 

4 ^ {O1.O2} ^ {0l',02'l 
/I — y Oi = Oi' /I — y O2 = O2' 



RLex^i 



RLex^i 



— > 0 \ = 0 \ A — ¥ O2 -< O2' 

A {0i,02> ^ {0i\02'} 

{0l,02} = {0l^02^} 



RLex^2 



{Ol,02> ^ {0l',02^} 



RLex-^2 



RLex= 



A {O 1 .O 2 } = {Oi',02'l 

A, Oi 0\ — ^ P A^ Oi = Oi' , O 2 P. O 2 ' — ^ P 
A, {0^,02} ^ {O/.O 2 '} P 

A,{0i,02} -< {0i'.02'} ^ P A, {0^,02} = {0i',02'} 
A, {0^,02} < {0i',02'l P 

Zi. Oi = Oi', O 2 = 02 ' — > P 

; : LLex= 

A,{0i,02} = {0l',02'l P 



LLex^ 



LLex^ 



Fig. 2. Lexicographic Extensions 
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Another example is taken from the representation of first-order logic [6] . We 
can represent formulas by the type family o. Individuals are described by the 
type family i. The constructor V can be defined as forall: (i -> o) -> o. 
We might want to show that A T (which represents [t/x]A) is smaller than 
forall \x.Ax (which represents Va;.A). Similarly, we might count the number 
of quantifiers and connectives in the informal proof, noting that a term t in 
first-order logic cannot contain any logical symbols. Thus we may consider AT 
a subterm of forall \x.Ax as long as there is no way to construct an object 
of type i from objects of type o. A term M is smaller than a A-term (Xx.N) if 
there exists an instantiation T for x s.t. M is smaller than [T/a;]A^ and the type 
of T is a subordinate to N. For a more detailed development of mutual recursion 
and subordination we refer the reader to R. Virga’s PhD thesis [14]. 



Zi, Xx.M = hNi ...N„ — > P 
A >■ [a/x]M = [a/x]N 



L = 3 



A 

A - 



Zi ■ 



Zi - 
Zi ■ 



Zi . 
Zi . 



■ Xx.M = Xx.N 
[a/x]M -< N 



R= X 



Xx : A.M A N 
^ M A [a/x]N 



M A Xx: A.N 

^ [a/x]M A N 
Xx : A.M A N 

^ M A [a/x]N 



RLA A“ 



RRA X 



RL< A“ 



M A Xx : A.N 
[a/x]P 



RRA X 



IJx.P 



Rn“ 



A,a = hNi...N„ — >P ^ ^ 
A, \a/x]M = [a/x]N — > P 



A, Xx.M = Xx.N — >P ^ 

A, \alx\M an — > F 



/l, Xx : A.M A N 


— ^ P 


LL^ X 


A, M A [a/x]N - 


P 


LR^ X^ 


A,M A Xx : A.N 


— P 


A, [a/x]M A N - 


P 


LL< X 


A,Xx : A.M A N 


— P 


A, M A [a/x]N - 


P 


LR< X^ 


A,M A Xx : A.N 


— P 


A, \a/x]P > P' 


LH 




A, Ux.P — ¥ P' 





Fig. 3. Higher-order Extensions 



Reasoning about A-terms cannot be solely based ^ and =, as neither 
[a/x]M = Xx.M nor [a/x]M -< Xx.M is true. Therefore, we introduce a set 
of inference rules to reason about ^ which are similar to the ^ rules. Extensions 
to higher-order subterm reasoning are presented in Figure 3. As we potentially 
need different instantiations of the relation Xx.M -< N when reading the infer- 
ence rules bottom-up, we need to copy Xx.M -< N in A even after it has been 
instantiated. For simplicity, we assume all assumptions persist. Note that we only 
show the case for mutual recursive type families, but the case where type family 
a is a subordinate to the type family a' can be added in straightforward manner. 
For handling parametric order relations we add i?iT“ and LIT which are similar 
to universal quantifier rules in the sequent calculus. Similar to instantiations of 
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Xx.M N, we need to keep a copy of IIx.P after it has been instantiated. The 
weakening and contraction property hold for the given calculus. 

Reasoning about higher-order subterm relations is complex due to instanti- 
ating A-terms and parametric orderings. Although soundness and decidability of 
the first-order reasoning system might still be obvious, this is non-trivial in the 
higher-order case. In this paper, we concentrate on proving consistency of the 
higher-order reasoning system. Consistency of the system implies soundness, i.e. 
any step in proving an order relation from a set of assumptions is sound. The 
proof also implies completeness i.e. anything which should be derivable from a 
set of assumptions is derivable. 



4.2 Consistency of Higher-Order Subterm Reasoning 



In general, the consistency of a logical system can be shown by proving cut 
admissible. 



A 



P 



A.P 



P' 



A 



P' 



■ cut 



A usually consists of elements which are assumed to be true. Any P which can be 
derived from A is true and can therefore be added to A to prove P' . In our setting 
A consists of reduction orderings which have already been established. Hence, 
the reduction orderings are true independently from any other assumptions in 
A and they are assumed to be valid. The application of the cut-rule in the proof 
can therefore only introduce valid orderings as additional assumptions in A. 

Theorem 2 (Admissibility of cut). 

1. IfV : . — > M = M' and S : A, M = M' — > P' then P : A — > P' . 

2. IfV: . — > aM < M' and S : A, X^.M < M' — > P' then T :A — > P' . 

3. IfV: . — > aM M' and S : A, X^.M ^ M' — > P' then T : A — > P' . 



The substitution a maps free variables to new parameters. In general, we allow 
the cut between aM -< N and XHf.M -< N where aM is an instance of X^.M. 

However, we will not be able to show admissibility of cut directly in the given 
calculus due to the non-deterministic choices introduced by A-terms. Consider, 
for example, the cut between 
Vi 

. — a o [a/x\M -< N ^ 

V = RL<X- A,Xx.M-<N — >P 

. — aXx.M -< N 

We would like to apply inversion on £; therefore we need to consider all 
possible cases of previous inference steps which lead to £. There are three possible 
cases we need to consider: LR-<X°‘ and LL^X. Unfortunately, it is not 

possible to appeal to the induction hypothesis and finish the proof in the 
and LR~iX case. This situation does not arise in the first order case, because all 
the inversion steps were unique. In the higher-order case we have many choices 
and we are manipulating the terms by instantiating variables in A-terms. 
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The simplest remedy seems to restrict the calculus in such a way, that we 
always first introduce all possible parameters, and then instantiate all U quan- 
tified orders and Xx.M which occur on the left side of a relation. This means, 
we push the instantiation with parameter variables as high as possible in the 
proof tree. This way, we can avoid the problematic case above, because we only 
instantiate a A-term in Xx.M -< N, if N is atomic. 

Therefore, we proceed as follows: First, we define an inference system, in 
which we first introduce all new parameters. This means we restrict the applica- 
tion of the Rdii, Rdi 2 , R<i, RR~<X, to only apply if the left hand side of 

the principal order relation ^ or ^ is already of base type. Similarly, we restrict 
the application of L<, LL<X, LL^X, i.e. the rule only applies if the right hand 
side of the principal ordering relation is of base type. In addition, we show that 
the application of the identity rules can be restricted to atomic terms. Second, 
we show this restricted system is sound and complete with respect to the original 
inference system. Third, we show that cut is admissible in the restricted calculus. 
This implies that cut is also admissible in the original calculus. The proof pro- 
ceeds by nested induction on the structure of P, the derivation T> and E. More 
precisely, we appeal to the induction hypothesis either with a strictly smaller 
order constraint P or P stays the same and one of the derivations is strictly 
smaller while the other one stays the same. For a more detailed development of 
the intermediate inference system and the proofs we refer to [9] 

Using the cut-admissibility theorem, cut-elimination follows immediately. 
Therefore, our inference system is consistent. This implies that all derivation 
steps in the given reasoning system are sound. It also implies that the inference 
rules are strong enough to deduce as much as possible from the assumptions and 
hence the system is complete. 



5 Related Work and Conclusion 



Most work in automating termination proofs has focused on first-order lan- 
guages. The most general method for synthesizing termination orders for a given 
term rewriting system (TRS) is by Arts and Giesl [1]. One approach to proving 
termination of logic programs is to translate it into a TRS and show termination 
of the TRS instead. However this approach has several drawbacks. In general, 
a lot of information is lost during the translation. In particular, if termination 
analysis fails for the TRS, it is hard to provide feedback and re-use this failure 
information to point to the error in the logic program. Moreover important struc- 
tural information is lost during the translation and constructors and functions 
are indistinguishable. One of the consequences is that proving termination of the 
TRS often requires more complicated orders. This is illustrated using an exam- 
ple from arithmetic. Using logic programming we implement a straightforward 
version of minus and the quotient predicate quot. 




Termination and Reduction Checking for Higher-Order Logic Programs 413 



minus : nat -> nat -> nat -> type, 
’/.mode minus +X +Y -Z . 
m_z : minus X z X. 
m_s : minus (s X) (s Y) Z 
<- minus X Y Z. 

’/.reduces Z <= X (minus X Y Z) . 
’/.terminates X (minus X Y Z) . 



quot : nat -> nat -> nat -> type . 

’/.mode quot +X +Y -Z . 

q_z : quot z ( s Y) z . 

q_s : quot (s X) (s Y) (s Z) 

<- minus X Y X’ 

<- quot X’ (s Y) Z. 

’/, terminates X (quot X Y Z) . 



Proving termination of quot is straightforward with the presented method. 
We first prove termination of minus. In addition we show that minus X Y Z 
satisfies the reduction constraint Z <= X. When we prove termination of quot, 
we can assume the reduction constraint X' -< X. As the reduction constraint 
X' < X implies X' -< (sA), we proved termination of quot. Note that only 
subterm reasoning is required to prove termination of quot while other methods 
like Arts and Giesl’s method for proving the corresponding term rewrite system 
needs a recursive path ordering. Another example is an algorithm to compute 
the negation normal form of a first-order logical formula and uses higher-order 
functions (see [9]). We implemented this algorithm using two mutual recursive 
predicates. Termination of this algorithm can be proven based on subterm order- 
ing, while the corresponding term rewriting system given in [5] requires a more 
complicated ordering like recursive path ordering. 

Although some of the underlying ideas in higher-order term rewriting system 
(HTRS) are shared with the logical framework, there are two principal differ- 
ences: First, all arguments of a predicate are in canonical form and therefore are 
terminating. This additional restriction simplifies termination analysis in the 
logical framework. On the other hand, the dependently typed ATT calculus, on 
which the logical framework LF is based, allows the representation of hypothet- 
ical and parametric judgements which make termination and reduction analysis 
more challenging. Hypothetical and parametric judgements have in general no 
counterpart in HTRS and their translation to HTRS seems difficult. 

One approach which analyzes logic programs directly has been developed by 
Pliimer [10]. The idea is to construct a subgoal dependency graph and then show 
that this graph is acyclic according to some ordering. Although this approach 
works well for Prolog programs, it is not obvious how to extend this method in a 
higher-order setting with parametric and hypothetical subgoals. In this paper we 
propose a proof-theoretical foundation for termination checking of higher-order 
logic programs. To infer that a specified ordering holds under a set of assump- 
tions, we introduced a deductive system to reason about structural orderings. We 
focused on consistency of the presented reasoning system. Consistency implies 
that anything we derive from the assumption is sound. Cut-elimination implies 
that the reasoning system is complete, i.e. everything which should be derivable 
from the assumptions is in fact derivable. A valuable advantage of this approach 
is its extensibility and its modularity. Similar to lexicographic extensions we can 
imagine extensions for simplification ordering, multi-set ordering and recursive 
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path orderings. In addition our method allows us to combine different structural 
orderings for different predicates. This is unlike other termination methods which 
require one ordering for the whole dependency graph. 

This paper builds on Rohwedder and Pfenning’s work on mode and termi- 
nation checking for higher-order logic programs [11]. Their termination checker 
requires a direct relationship between inputs of the recursive call and inputs of 
the original call without taking into account input and output relations. Reason- 
ing about orderings allows us to check proofs by complete induction such as the 
soundness proof discussed in this paper. The emphasis of their work has been 
the correctness of the termination checker with respect to the operational se- 
mantics of Twelf programs. Although we have not proven the correctness of the 
extended termination checker, we are expecting the proof to be a straightforward 
extension of their proof. 

One question not discussed in this paper is whether the system is decid- 
able. This question is not trivial as we can potentially instantiate A-terms and 
7T-quantified order relations which occur in the context multiple times. One ap- 
proach for proving decidability would be to show that we can bound the number 
of instantiations needed. 

Our system is implemented as part of Twelf, and efficiently checks programs 
and proofs. Currently multiplicity is restricted to one, i.e. we instantiate U- 
quantified orderings and A-terms occurring on the left hand side of a relation in 
the hypothesis just once. Although we can artificially construct examples which 
require multiplicity more than one, we have not encountered these cases in prac- 
tice so far. If a higher multiplicity is needed, an appropriate warning is returned. 
As our algorithm analyzes program clauses directly, its behaviour is easy to 
understand. In the case of failure, our implementation will point to the clause 
and argument where the error occurred. This enables the user to either revise 
the program or strengthen the ordering. In practice we have used the termina- 
tion and reduction checker on examples from compiler verification (soundness 
and completeness proofs for stack semantics and continuation-based semantics), 
cut-elimination and normalization proofs for intuitionistic and classical logic, 
soundness and completeness proofs for the Kolmogorov translation of classical 
into intuitionistic logic (and vice versa) Currently, Rohwedder and Pfenning’s 
termination checker is used in the automatic induction theorem prover. In the 
future, we plan to incorporate the extended termination checker. 



Acknowledgements. The author gratefully acknowledges numerous fruitful 
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correctness. 



^ The code of all the examples mentioned in the paper can be found at 
http : //www. cs . emu. edu/'bp/ code. 
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Abstract. This paper outlines the interactive proof explanation system 
P.rex, which adapts its explanation to the user and allows him anytime 
to utter questions or requests, to which it reacts flexibly. As a generic sys- 
tem, it can be connected to different theorem provers. The distribntion is 
available via the P. rex home page at http : //www . ags . uni-sb . de/~prex. 



1 The P.rex System 

P. rex is an interactive proof explanation system that adapts its explanations 
to the user and flexibly reacts to his questions or requests. An overview of its 
architecture is provided in Figure 1. 

As a generic system, P. rex can be connected to different theorem provers, 
namely by means of the formal language Twega for specifying proofs and 
mathematical theories (cf. Section 2). Mathematical theories are organized in 
a hierarchical knowledge base. Each theory in it may contain, for example, ax- 
ioms, definitions, and theorems along with proofs. A proof of a theorem can be 
represented hierarchically in Twega such that the various levels of abstraction 
are made explicit. 

The central component of the system is the dialog planner (cf. Section 3). 
It is implemented in ACT-R [1], a goal-directed production system that aims to 
model human cognition. In ACT-R, declarative and procedural representations 
of knowledge are explicitly separated into the declarative memory and the proce- 
dural production rule base. The plan operators of the dialog planner are defined 
in terms of productions and the discourse plan is represented in the declarative 
memory. 

To explain a particular proof, the dialog planner first assumes the individual 
user’s supposed cognitive state by updating its declarative and procedural mem- 
ories from the data base of user models. Then, the dialog planner sets the global 
goal to show the proof. ACT-R tries to fulfill this goal by successively applying 
productions that decompose or fulfill goals. Thereby, the dialog planner not only 
produces a dialog plan, but also traces the user’s cognitive states in the course 
of the explanation. This allows the system both to always choose an explanation 
adapted to the user, and to react to the user’s interactions flexibly. 

The dialog plan is passed on to the presentation component. Currently, we use 
a derivate of PROVERB'S micro-planner [8] to plan the internal structure of the 
sentences, which are then realized by the syntactic generator TAG-GEN [9] . The 
uttered sentences are Anally displayed on the interface. It also allows the user to 
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Fig. 1. The architecture of P. rex 



enter remarks, requests and questions anytime. An analyzer receives the user’s 
interactions and passes them on to the dialog planner. In the current stage, 
we use a simplistic analyzer that understands fifteen predefined quasi-natural 
language interactions. 



2 The Representation of Mathematical Objects 

The calculus of constructions (AC) [3] is a dependent typed lambda calculus that 
was devised as a formalism to represent mathematics. Twega is an implemen- 
tation of AC with /3?7-conversion extended by two additional features: signatures 
and constant definitions.^ Its abstract syntax is given as follows: 

TermsJ ::= V | 6 | TT | AV:T.T | J7V:T.T 

where V and C are infinite collections of variables and constants, respectively. 
We write A^B for IIxiA.B if x does not occur in B. 

To describe the basic judgments, we consider signatures, which contain only 
constant declarations and definitions, and contexts, which contain only variable 
declarations. The type system stratifies terms into three levels: objects, types, 
and kinds. Let A be a signature, T a context, and A, B terms. Judgments are 

P \~E A: B A and B are valid terms and A is of type B 
P \~E A = B A is definitionally equal to B 

The notion of definitional equality we consider here is /Jry-conversion. [6] gives 
the complete definition of Twega. 

In Twega, we employ a representation technique that is called judgments- 
as-types [7]. This technique is characterized by mapping judgments to types and 
their proofs to object terms, thus reducing the problem of proof checking to the 
problem of type checking. A special type family that is indexed by formulae is 

^ The implementation of Twega draws on Twelf [10], an implementation of the LF 
logical framework [7], which is contained in AC. 
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Table 1. A fragment of the representation of the ND calculus in Twega. 

Types 

i : type o : type nd : o— >type 

Function and Predicate Symbols 

f:i— >■•••— P:i— >■•••— true : o false : o 

Connectives 

and : o— >o— >o imp : o— >o— >o 
Inference Rules 

ander : II A:o. TIB :o.nd (and A B)— >-nd B 

impi : 77A : O.J7B : o.(nd A— >-nd B)— >-nd (imp A B) 



used to represent judgments as types. Inference rules are represented as functions 
from judgments to judgments. Constant definitions allow us to represent several 
levels of abstraction for a given proof. 



Example 1. Table 1 gives the judgment-as- types representation of a fragment of 
the ND calculus in Twega. Note that the type family nd serves to represent 
ND judgments. The following constant definition represents a derived inference 
rule A Comm that expresses the commutativity of the conjunction: 

andcomm = AA: o.AB :o.Ap: nd(and A Bj.andi B A (ander A B u) (andel A B u) 

: BA : o. JIB : o.nd(and A B)— >nd(and B A) 



The A-term is called the expansion of andcomm and represents the derivation of 
the inference rule, whereas its type (i.e., the J7-term) represents the inference rule 
itself. Now, let us consider the following ND proof of the theorem PAQ D QAP: 



A Comm 



[l~wD B A Q]^ 
b wD Q A B 



b ND P AQ D Q A P 






This theorem and its proof are represented in Twega by the following judgment: 



AB: o.AQ : o.impi (and B Q) (and Q P) 

(Au:nd (and B Q). andcomm P Q u) 

: BB:o.BQ:o.nd (imp (and B Q) (and Q B)) 

Replacement of andcomm by its expansion renders a more detailed proof. 



In the remainder of this paper, we mean \~s whenever we write b . We often 
write P \- ip when there is some T> such that P \- T> ■. p. 



3 Discourse Planning 

The dialog planner of P. rex plans the dialog by building a representation of the 
structure of the discourse that includes speech acts as well as relations among 
them. Speech acts are the primitive actions planned by the dialog planner. Each 
speech act can always be realized by a single sentence. The discourse structure 
is represented in the declarative memory. 




P. rex: An Interactive Proof Explainer 419 



The plan operators are defined as productions. Each production either fulfills 
the current goal directly or splits it into subgoals. Let us consider 

r h Rci . . . Cm'Di . . .T>„ :i> 

where R is an inference rule, ci , . . . , Cm are parameters, and T>i with F \- T>i : ipi 
is the derivation of pi for 1 < i < n. 

An example for a production is: 

(PI) IF the current goal G is to show F \- ip 

and R is the most abstract rule known to the user justifying G 
and F \- ipi, . . . , F \- are known to the user 
THEN produce the speech act 

(Derive : Reasons (ipi, . . . , y>„) : Conclusion ip : Method R) 
and pop G (thereby storing F \- ip \n the declarative memory). 

By producing the speech act (which may be verbalized as “Since ipi, . . . , ipn, we 
obtain ip by i?.”) the current goal is fulfilled and can be popped from the goal 
stack. An example for a production decomposing the current goal into several 
subgoals is: 

(P2) IF the current goal G is to show F \- ip 

and R is the most abstract rule known to the user Justifying G 
and = {pi\r h y>i is unknown to the user for 1 < i < n} yf 0 
THEN for each (fii £ 'P push the goal to show F h ifii. 

Note that the conditions of (PI) and (P2) only differ in the knowledge of the 
premises :pi for rule R. (P2) introduces the subgoals to prove the unknown 
premises in <P. As soon as those are derived, (PI) can apply and derive the 
conclusion. Moreover, note that these production are independent of the cal- 
culus that is represented in Twega. However, there are also calculus-specific 
productions. Cf. [4] for a more detailed discussion of plan operators in P.rex. 

The ability for user interaction is an important feature of explanation sys- 
tems. In P. rex, the user can interact with the system at any time. Each inter- 
action is analyzed by the analyzer and passed on to the dialog planner, which 
then plans an appropriate reaction. [5] describes user interaction in P.rex. 

Example 2. Consider again the proof from Example 1. The explanation of this 
proof by P. rex is displayed in Figure 2 with the user’s utterances underlined. 

Note that the user interrupted the system when he did not understand the 
step justified by AComm. The system entered a clarification dialog to pinpoint 
the reason why the explanation did not succeed and re-planned the step using the 
expansion of andcomm. After verifying that the user understood the expansion, 
the dialog planner resumed the explanation of the proof. 

Similarly, the system is able to re-plan a derivation at a higher level of ab- 
straction when the user complains that an explanation is too detailed. 

4 Availability 

P. rex has been successfully used for the explanation of proofs found by Hmega 
[2] and Twelf [10]. The distribution is available via the P.rex home page at 
http : //www . ags . uni-sb . de/~prex. 
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Theorem 1 ((P A Q) D (Q A P)). 

Proof: 

Let (P A Q). That implies that (Q A P) by AComm. 

This step is too difficult. 

{Q A P) since (P A Q) by AComm. 

Do you understand this step? No. 

Do you understand the premises? Yes. 

Then Q by APr- P by AEi. That implies that (Q A P) by A/. 
Do you understand this step? Yes. 

Then ((P AQ) D (Q A P)) by D /. 



Fig. 2. An example explanation. 
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Abstract. JProver is a first-order intuitionistic theorem prover that cre- 
ates sequent-style proof objects and can serve as a proof engine in in- 
teractive proof assistants with expressive constructive logics. This paper 
gives a brief overview of JProver’s proof technique, the generation of proof 
objects, and its integration into the NuprI proof development system. 

1 Introduction 

In large scale applications of automated reasoning, interactive proof assistants 
such as Coq, HOL, Isabelle, NuprI, and PVS are the tools of choice. Because of 
their expressive logics, they are more generally applicable than first-order tools, 
yet at a much lesser degree of automation. 

JProver was developed in an effort to combine the expressive power of in- 
teractive proof assistants with the automatic capabilities of first-order theorem 
proving, both for reasoning about mathematics and for reasoning about pro- 
grams. It provides a theorem prover for first-order intuitionistic and classical 
logic based on the connection method [3,10], a tool for generating proof objects 
in the style of sequent proofs [11], and is coupled with mechanisms for integrat- 
ing the prover into the NuprI proof/program development system [4,1] and the 
MetaPRL proof environment [8,9]. These components enable a user to invoke 
the automatic prover on proof goals that can be solved by first-order reasoning 
while using the expressive logic of the proof assistant for the more demanding 
proof parts. Furthermore, the proof information returned by JProver enables the 
proof assistant to build a valid proof in its own calculus. 

As an example. Figure 1 describes the link between JProver and NuprI, which 
is described in detail in Section 3. JProver is a stand-alone prover that com- 
municates with a proof assistant through a logic module. Invoking JProver on a 
NuprI subgoal sequent causes this sequent to be sent to JProver. The proof-search 
method in JProver will then generate a matrix proof from the corresponding for- 
mula tree (provided the sequent is valid), which then will be converted into a 
list of sequent rules that expresses a sequent proof for the formula. Upon receiv- 
ing this list, NuprI will build a sequent proof for the original goal sequent, thus 
confirming that the proof found is valid. Information about the relation between 
this sequent and the formula proven by JProver will be used during that step. 
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Fig. 1. Architecture of JProver in connection with NuprI 



Over the past years there have been various approaches to combining inter- 
active proof assistants with automatic proof tools [2,17]- Our application differs 
from these in that we provided a fully automatic theorem prover for classical and 
intuitionistic first-order logic with a very compact search space. A user may trust 
its results or expand them in order to inspect the proof. Furthermore, JProver 
supports constructive logic and is thus well suited for reasoning about programs. 

Although this paper focuses on the integration of JProver into NuprI and 
MetaPRL, the underlying mechanisms are quite general and might easily be 
adapted to integrate JProver into other proof assistants for constructive and 
classical logics. In the rest of this paper we shall briefly discuss JP rover’s proof 
search procedure, the tool for generating proof objects, and the mechanisms for 
integrating JProver into the NuprI and MetaPRL proof development systems. 

2 JProver: Proof Search and Transformation 

JProver implements a full first-order theorem prover for classical and intuition- 
istic logic that realizes the connection-based proof procedure presented in [10]. 
It transforms a set of first-order sequent formulas into a set of formula trees, 
that will be annotated by tableau types, polarities, and so-called prefixes. Dur- 
ing the proof process, JProver identifies connections between pairs of atoms and 
checks whether each path through the formulas contains such a connection. The 
formula is valid if each of these connections is complementary, that is if the 
connected atomic formulas can be unified by a global term substitution and - 
for intuitionistic validity - if their prefixes can be unified. To compute the pre- 
fix substitution, we use a specialized string unification algorithm based on [14]. 
The resulting matrix proof is a reduction ordering that consists of the original 
formula trees together with the connections and non-permutability constraints 
induced by the substitutions. 

JProver’s converter component uses the algorithms described in [11,15] to 
reconstruct a first-order sequent proof from the classical or intuitionistic matrix 
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proof. It essentially transforms the reduction ordering into a linear order and 
constructs a sequent rule for each node, using the term substitution to instantiate 
quantified variables. Since additional proof knowledge from the matrix proof is 
exploited, proof reconstruction can be done without search [15,16]. 

The selection of the target sequent calculus for proof reconstruction depends 
on the calculus underlying the connected proof assistant. For the intuitionistic 
case, JProver first generates a multiple- conclusioned sequent proof [6] because 
of its proof-theoretical closeness to the matrix proof. If needed, this proof can 
further be transformed into a single- conclusioned sequent proof [7] using a sec- 
ond conversion step as described in [5]. Nuprl, for instance, requires a single- 
conclusioned proof whereas MetaPRL does not. The resulting sequent proofs can 
be used to generate proof objects in order to validate, check, or guide proof 
construction in the interactive proof assistants. 

JProver is implemented in OCamI as a stand-alone theorem prover. How- 
ever, it is embedded into the MetaPRL environment [9], which allows it to use 
MetaPRL’s quantifier unification algorithm as well as its module system for com- 
municating with interactive proof assistants. 



3 Integration into Interactive Proof Assistants 

JProver is implemented on top of the MetaPRL core, using MetaPRL as a toolkit 
that provides the basic functionality — term structure, substitution, unification, 
etc. JProver takes as its input a small JLogic module that represents the logic 
of the proof assistant with which JProver will cooperate. The JLogic module 
describes which terms implement logical connectives, how to access subterms 
from those connectives, and how to convert JProver’s generic representation of 
a sequent proof into the internal data structures of the proof assistant. 

In order to be able to call JProver from some proof assistant, one would need 
to write a logic module that consists of two components: a piece of OCamI code 
for communicating with that proof assistant (using whatever communication 
protocol developers would choose) and a JLogic module capable of decoding 
the sequent received from that communication code and of encoding JProver’s 
response into a form the communication code expects. 

Currently we have integrated JProver into the MetaPRL and Nuprl systems. 
The technical integration of JProver into MetaPRL is straightforward, as JProver 
is a module in MetaPRL’s code base. MetaPRL can communicate with it simply 
by making a function call. The logical module of the MetaPRL type theory passes 
its formulas directly to JProver and the JLogic module for MetaPRL converts 
JProver’s sequent proof into a MetaPRL tactic, which will generate a MetaPRL 
proof for the proof goal. 

The integration into Nuprl (Figure 1) is not as straightforward. Calling 
JProver from a Nuprl sequent requires Nuprl to preprocess the goal and the list 
of hypotheses and to send them to a MetaPRL process running JProver. The 
preprocessing accounts for differences in the representation of variables and ap- 
plications of terms, and also addresses differences in the type theory semantics. 
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For example, JProver, as a first-order intuitionistic prover, cannot understand 
type information contained in Nuprl’s sequents. We can, however, encode the 
type information as a logical predicate which is understood, and then later rein- 
terpret JProver’s results to fit the original sequent. In most cases, however, the 
logical proof does not depend on type information. We simply discard it if the 
sequent mentions only a single type. 

To communicate the processed sequent, the NuprI /JProver link takes ad- 
vantage of the NuprI Logical Programming Environment’s [1] open architecture, 
which supports communication with external proof tools by sending terms in 
MathBus format [13] over an INET socket. Since most of the terms in the sequent 
are left unchanged, the common MathBus format is valuable in communicating 
and understanding contrasting syntax of the linked systems. Once the sequent is 
sent, the JLogic module for NuprI describes how JProver can access the semanti- 
cal information of its terms and also how to convert JProver’s resulting sequent 
proof into a list of sequent rules with parameters, that NuprI can then interpret. 
From this list of rules, Nu pri then builds a proof tree for the original sequent in 
a depth- first, left-to-right fashion. 

Neither MetaPRL nor NuprI rely on the correctness of JProver or the pro- 
cessing. Instead, JProver’s output provides these systems with a proof strategy, 
which is then executed on the original sequent in the respective environment. 



4 Progress and Availability 



The connection between JProver and NuprI is an example in which hybrid proofs, 
i.e. proofs created by multiple provers with different formalisms, have been suc- 
cessfully and verifiably generated. It gives a user the full expressive power of the 
proof assistant when dealing with complex proofs and verifications, while at the 
same time taking advantage of well-understood and efficient proof techniques for 
subproblems that only depend on first-order reasoning. 

A snapshot from a proof of the “Agatha Murder Puzzle” is depicted in Fig- 
ure 2 and illustrates the cooperation of JProver with NuprI. After the first step 
the user invokes JProver through a NuprI tactic, which completely proves the 
goal (left window). To inspect proof details, the user may request the complete 
sequent proof with elementary rules to be displayed (right window). Experience 
has shown that this option has considerable educational value. 

It should be noted that JProver is not restricted to the syntax of first-order 
logic: unknown terms are simply treated as uninterpreted function or predicate 
symbols. This allows us to apply JProver to proof problems that are usually out- 
side the range of first-order provers and to combine it with other proof techniques 
that are available to proof assistants. 

In the future we intend to extend JProver’s capabilities by coupling it with 
NuprI tactics and decision procedures. We also intend to strengthen the prover 
component by adding mechanisms for inductive theorem proving described in 
[12] and modules for handling modal logics and fragments of linear logic [10,11]. 
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BY pprover 




* top 

1. Agatha hates Charles 

2. Agatha hates Agatha 

3. Vp:Person. (C-p is richer than Agathal ^ The Butler hates pi 

4. VprPerson. (Agatha hates p ^ (-Charles hates p)) 

5. Vp:Person. (Agatha hates p ^ The Butler hates p) 

G. VprPerson. (((-p hates Agatha) ^ (-p hates The Butler)) v' (->p hates Charles)) 
?. Vp^q:Person. (p kills q => (-p is richer than q)) 

8. Vp^q:Person. (p kills q ^ p hates q) 

h (-The Butler kills Agatha) (-Charles kills Agatha) 

BY allL (3) The Butler 



9. (-The Butler is richer than Agatha) ^ The Butler hates The Butler 
h (-The Butler kills Agatha) (-Charles kills Agatha) 

BY allL (4) Agatha 

* 1 1 

10. Agatha hates Agatha ^ (-Charles hates Agatha) 

I- (-The Butler kills Agatha) (-Charles kills Agatha) 



Fig. 2. The NuprI /JProver link: proving the “Agatha Murder Puzzle” 



These modules will make JProver valuable for a variety of other proof assistants. 
We plan to build the corresponding interfaces as well. 

Although JP rover’s main emphasis is not high-performance but bringing the 
advantages of connection-based theorem proving such as complete and efficient 
search into tactic-based proof assistants, we plan to incorporate well-known tech- 
niques for speeding up automated theorem provers in order to improve JProver’s 
performance as a stand-alone prover. 

JProver is a part of the MetaPRL code base and can be downloaded from 
MetaPRL’s home page [9]. An executable copy of NuprI running under Linux is 
available at 

http : //www. cs . Cornell . edu/Inf o/Projects/NuPrl/nuprl5/ index.html 
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Abstract. This paper presents an algorithm, XLNH, to generate finite 
models of first order equational theories. Unlike conventional methods, 
which focus on using as few individual constants as possible to preserve 
symmetries, XLNH heuristically selects then fully generates the func- 
tions that appear in the problem, using a weighted directed graph of 
functional dependency. One key issue here is to constructively generate 
isomorphic partial models then further exploit the resulting symmetries. 
This algorithm proves very efficient on problems involving a unary bi- 
jective function / (like the additive inverse in a group or ring theory). 
When such a bijection is fully instantiated, XLNH statically exploits re- 
maining isomorphic subspaces. These ideas are implemented using the 
public domain SEM software framework, and give order of magnitude 
improvements on many problems. These results are interesting on their 
own but potentially generalize to many practical CSP applications. 



1 Introduction 

Equational theories provide a great number of difficult problems. Zhang in [9] 
defines a set of problems which can form a challenge of finite model search sys- 
tems. Several open mathematic problems were solved with different approaches: 
FALCON [10], FINDER [6], MGTP-G [3], LDPP, SATO [8], FMC [5] and MACE 
[4]. 

An equational theory is a set of axioms: first order logic formulas involving 
equality (e.g. : h{f{x,y)) = f{z,x)). We consider here theories in 

which all the variables are universally quantified. Finding a finite model for such 
a theory amounts to finding an interpretation of functional symbols over a finite 
domain which satisfies all axioms. The existence of a model demonstrates 
the consistency of the theory. The existence of a counter model may refute a 
conjecture. 

Finding a model of an equational theory can be viewed as a special kind of 
constraint program, where the constraints are highly symmetrical. Symmetries 
arise because all constraints are universally quantified. Known approaches to 
finite model search have explored ways to tackle those symmetries. MGTP-G 
[3] uses ad hoc axioms to filter out some symmetries statically. FALCON [10], 
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and SEM [11] use a dynamic cut and heuristic procedure (LNH: Least Number 
Heuristic) to avoid exploring symmetrical subspaces during search. On the other 
hand, many constraint programs of practical or industrial interest exhibit a 
subproblem having functional semantics. 

Our approach generalizes the LNH heuristic to avoid exploring isomorphic 
subspaces. The new heuristic can be used with many difficult problems, and 
gives impressive performance improvements in all cases. 

The paper is organized as follows: section 2 defines equational theories. Sec- 
tion 3 describes the model equivalence proposition. The basic principles of the 
enumeration procedure are discussed in section 4. In section 5 we describe the 
function selection strategy. Experimental results are listed in section 6. Section 
7 gives a conclusion. 

2 Equational Theories 

2.1 Syntax 

We use a subset C of first order logic, without existential quantifiers, with equal- 
ity as the only predicate {=}. In C, all the variables are universally quanti- 
fied. The disequality symbol {y^} denotes the negation of equality. The set of 
variable names is {x, y,z,x\,X 2 ■ ■ ■}■ Constants are either integers from the set 
{0, 1, 2 . . . } or identifiers (most often a letter from the set {o, b, c,k,ki,k 2 ---})- 
A functional symbol can be any identifier not ambiguous with one of the previous 
categories, most often a letter from the set {/, g,h, . . ,r,s}. A term is recursively 
built upon functional symbols, variable names and constants. 



h{x, 0) = X 
h{0, x) = X 
h{x,g[x)) = 0 
h{g{x),x) = 0 
h{h{x,y),z) = h{x,h{y,z)) 
h{x,y) = h{y,x) 



Fig. 1. Abelian Group Axioms 



Since all variables are universally quantified, universal quantifiers are usually 
omitted in the axioms for simplicity. Figure 1 illustrates the possibilities offered 
by the language. C is rich enough to formulate the axioms of mathematical 
objects like abelian groups or unit rings. Because C has only one predicate, 
equality, sets of C axioms are commonly called ’’equational theories”. It is of 
considerable interest to mathematicians to prove or refute the existence of finite 
structures satisfying axioms in C. Hence C is at the same time an excellent 
experimentation basis and a field of application. The concepts introduced in 
this paper can be extended to richer languages like the many sorted first order 
language used as input to the first order finite model generator SEM [11]. 
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2.2 Semantics 

We use traditional naming conventions in the field of CSP-based finite model 
generation (FALCON [10], SEM [11]). Without loss of generality, individuals 
are taken from the set N = {0, 1,2,...} of natural numbers. Since we are only 
interested in finite models, we interpret a theory T in £ on a finite set = 
{0, l,2,...n— 1}. Constants (integers) are interpreted as themselves. We call 
a cell the ground term f{ei, . . . Ck) where all Ci belong to Cells map to 
constraint variables in the associated constraint problem. D„ is called the domain 
of these variables. The members of Dn are called individuals. An interpretation 
In (or simply I) of a theory T maps each cell to a value from _D„. The resulting 
structure defines an operation table for every function that appears in T (for 
instance the set: {/i(0,0) = 0, /i(0, 1) = 2, /i(0,2) = 3...}). A model I of order 
n of a theory T is an interpretation on _D„ which satisfies all the theory axioms. 

Let / be a function, and / an interpretation. We naturally define the inter- 
pretation If of / as the restriction of / to / cells. We often use /(ei , . . .Ck) to 
denote /(/(ei, . . . e^)) when not ambiguous. Let g he a, unary function, we may 
also use g''{x) to denote I{g{I{g{...{x))))) 



2.3 A CSP Approach to Model Generation 

An equational theory can be viewed as a special kind of constraint program, 
a triple (V, D, C) where V is the set of constraint variables, D is the domain 
(or set of possible values) for these variables, and C is a set of constraints, i.e. 
relations listing possible combinations of variables values. 

Here, the set V of variables is the set of function cells and the domain D 
is the set Different approaches exist to implement the constraints (more 
efficiently than as extensive lists of compatible tuples) . Enumerative model gen- 
erators usually rely upon constraint propagation algorithms, with a tradeoff be- 
tween propagation efficiency (i.e. the completeness of the decisions made by the 
propagation algorithm alone) and the cost of maintaining the associated data 
structures. FMSET [2] experimented using boolean propagation and a clause ffat- 
tening technique (to achieve ultimate propagation efficiency), at the expense of 
additional memory costs. SEM (the public domain tool we based our experiments 
upon) uses the terminal instances of the axioms and propagates newly known 
values of function cells upwards in the structure. SEM compensates the loss of 
most downward propagations by more concise and efficient data structures and 
indexing. A potentially useful source of (non symmetry aware) improvement of 
the finite model generation may be achieved using lookahead strategies as shown 
in [1]. 

Before the search starts, SEM generates all the terminal instances corre- 
sponding to every axiom in the theory. For instance, the axiom h{x,g{x)) = 0 

(group inverse) expands to /i(0,g(0)) = 0, ft-(l, 5 (l)) = 0, /i(2,g(2)) = 0 

These terminal axioms are stored in memory using a pointer based representa- 
tion that allows for fast upward propagation. Whenever a leaf cell value becomes 
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known (e.g. 5(0)), its actual value is substituted in all the constraints where it 
appears (which may generate a new cell value: here /i(0, 0) = 0), and the process 
repeats to fix point or failure as long as new cell values are introduced. 



3 Model Isomorphism 

When building an equational theory model, some isomorphic branches in the 
search tree can be cut by observing that all individuals i from D„ that were not 
used as a cell index or as a cell value in previous choice points are interchange- 
able. This intuition led to the implementation of the Least Number Heuristic in 
FALCON ([10]), a program that solved several open problems for the first time. 

Because equational theories are highly symmetrical, the LNH alone does not 
cut all unwanted search branches. This research focuses on that issue, exploring 
ways to retrieve part of the original symmetries, even after all individuals have 
been used. Most equational theories involve a unary bijection (as in group or ring 
axioms), or can be adapted to involve one (like in quasi groups). This section 
proves a proposition that leads to an improved model generation procedure: 
the search starts by generating a model of a unary function if it exists. After 
this model was computed, the search can proceed from a state where remaining 
symmetries can be deterministically suppressed and thus require no dynamic 
tests. 

Definition 1. Let T he a theory, I an interpretation, E a subset of Dn, and 
f a (unary) functional symbol. We define f{E) as the set {/(/(e))|e G E}. As 
usual, we also define f~^{E) as the set {e G I?„|/(/(e)) G E}. 



Definition 2. Let T he a theory, g a unary function and Lg an interpretation of 
g on Dn = {0, . . . n— 1}. An inclusion minimal subset c of Dn such that g{c) = c 
is called a cycle. An element i G D„ appears in at most one cycle, called Ci. We 
define the size size(c) of a cycle c as jcj — 1. 



Definition 3. Let g he a unary function and Ig an interpretation of g on Dn. 
The inclusion maximal subset Dj^ of Dn such that g(Di^) = Dj^ = g~^{Dj^) is 
the hijective restriction of g. 



Note that such a bijective restriction is not the union of all the cycles, except 
when Lg interprets g as a bijection. In that case, we even have = D„. 



Example 1. Let g be a unary function under the following interpretation Lg. 



g 



0123456789 10 11 
12234658799 1 



Under the interpretation Lg, the elements of Di^ (here the set {3, ...8}) belonging 
to cycles of equal sizes remain interchangeable (e.g. 5 and 7). Obviously however, 
the individual 2 is not interchangeable with 3. 



These intuitions lead to the proposition 1 below. 
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Proposition 1. Let T he an axiom system with a unary function g, and In a 
model ofT. Let h he a function in T (h ^ g), h{i\, . . .ik) a cell, and v € Dj^ 
such that In{h{ii, . ■ .ik)) = v and v ^ for all ij. Then for every w ^ v 
s.t. \cw\ = \cv\ and w belongs to none of all Ci. there exists an isomorphism 
transforming to a model I'n ofT in which I'n(h(ii, . . . ik)) = w. 

Proof. Let Cy and Cy, be the (non empty) cycles of v and w. There are two cases: 

— Cy Cy,: let a : D 1 -^ D he the isomorphism equal to the identity everywhere 
but on Cy and Cy, which maps g'‘{v) to g^{w) and g’‘{w) to g^{v) for all i in 
[0..|ct,|). 

— Cy = Cy, = c: let k be the isomorphism equal to the identity everywhere but 
on c which maps g^(v) to g^{w) for all i in [0..|c|[. 

By definition, cr is such that a{g{i)) = g{a{i)). a naturally extends to a model 
isomorphism by mapping any ground assignment h(i\, . . .in) = v (where h yf 
g) to h{u{ii), . . .a{in)) = cr{v) so that cr(h(zi, . . . z„)) = h{a{i\), . . .a{in)) for 
all ii,. . .in in Z?„. These conditions, together with the fact that a is bijective 
and that all universally quantified axioms are valid under In, ensure that every 
terminal instance of any axiom ti = t 2 of the theory T is valid under a{In). □ 



Example 2. Assume we want to generate abelian groups (cf. figure 1 axioms) of 
order 5. Let In be a model of AG that interprets g as Ig below: 





0 


1 


2 


3 


4 




0 


T 


~2 


4 


3 



If In interprets h{l,2) as 3, we know that there exists an isomorphic model 
If where h{l,2) = 4. This property can be used in the enumeration procedure 
to avoid exploring symmetrical search spaces. 

4 Enumeration Procedure 

Proposition 1 shows that some model isomorphisms remain when a partial inter- 
pretation for a unary function g has been computed. The best situation occurs 
if the theory axioms ensure that g is bijective, since in that case = Dn 
and is maximal in size. This suggests to try starting the model generation by 
completely producing a model for a bijective function g, if it exists, to exploit 
the remaining symmetries further. In the rich domain of group and ring theories, 
the group inverse function g is not only bijective, but satisties g’^{i) = i for all 
i G Dn- Cycles in that case are of size 0 or 1, which even further reduces the 
number of different g interpretations. 

It is easy to generate only non isomorphic (canonic) interpretations of a 
bijective unary function. The idea simply is to generate the function so that its 
cycles are increasing, or decreasing in sizes, as suggests the following proposition: 
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Proposition 2. Let g he a unary function and Ig an interpretation of g on Dn- 
There exists an integer I in Dn and a permutation a on Dn mapping Ig to an 
interpretation <j{Ig) such that in <j{Ig): 

- Di^ = {Q,...l} 

— for every cycle c, c elements are consecutive integers and only the highest 
element e in c is such that g{e) < e. 

— for every two consecutive cycles c\ and C 2 , |ci| < |c 2 | 

This proposition is easily proved by iteratively building a as the appropriate 
renaming on D. Because an equational theory involves constants (like the neutral 
element in group axioms), for completeness reasons, proposition 2 cannot be used 
as such in an enumeration procedure. Constants appearing in equalities must be 
interpreted first by the algorithm. The chosen values are not interchangeable 
with any other. In addition, these individuals may belong to a cycle of g, or not 
(in that case, the function g is not bijective). Technically, the individuals selected 
to interpret p constants kp may without loss of generality satisfy I{kj) < j — 1 
for j G {0..p — 1}. This leads to the following proposition: 

Proposition 3. Let k\, . . . ,kp be p constants, Iki their interpretation, and g a 
unary function. Let Ig interpret g on Dn- There exist two integers m and I in 
Dn (m<l) and a permutation a on Dn mapping Ig to an interpretation <j{Ig) 
such that in <j{Ig): 

- 5”({4j) = {0,...m-l} 

- Di^ -ff”({4J) = {to, ... ,1} 

~ for every cycle c C (to, ... ,/}, c elements are consecutive integers and only 
the highest element e in c is such that g{e) < e. 

— for every two consecutive cycles ci C {to, ... , ?} and C 2 C Dj^, |ci| < |c 2 | 



Definition 4. An interpretation Ig satisfying proposition 3 requirements is 
called canonic. Let c he a cycle in Ig. The smallest element s in c is called 
start(c), and the highest element e in c is called end{c). 

According to the above statements, given g a unary function and Ig a canonic 
interpretation, it is clear that size{c) = end{c) — start{c) for every cycle c. In 
the example 1, if we have one constant interpreted as 0, we have to = 3, I = 8, 
{3, . . . ,8} contains 4 cycles of respective lengths 0, 0, 1, 1. Note that two cycles 
are not in {3, . . . , 8}: g{2) = 2 and g(9) = 9. 

Model generation uses the two propositions 1 and 3 to exclude isomorphic 
subspaces when building models of a theory T involving a unary function g. It 
operates in three steps. 

4.1 Step 0 

The algorithm interprets the p constants appearing in equalities so that I{ko) = 0 
and I{kj) < I(fcj_i) + 1- This constructs a set {0, . . . ,q} of integers {q < p). 
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4.2 Step 1 

It then selects a function g from the problem statement and iteratively generates 
all its canonical models, as described in proposition 3. The idea simply is to 
construct Ig so that 

— ,g}) = {0,... ,m- 1} 

— — 5 "({ 0 , . . . , g}) = {m , . . . , for some m, I (note that there may exist 
cycles in {0, . . . , m — 1}). 

~ {m, . . . ,1} cycles only contain consecutive elements (note that this property 
cannot be ensured for the cycles that appear over {0, . . . , m — 1}) 

— for all i in {m, ... ,1}, either g(i) = z + 1 or z is the end of a cycle and thus 
g{i) = i - size{ci) 

— the cycles in {m, ... ,1} are increasing in size (note again that this property 
cannot be ensured for the cycles that appear over {0, . . . , m — 1}) 

This procedure allows to generate only a limited number of interpretations hav- 
ing isomorphic bijective restriction. In the case of a bijection, only non isomor- 
phic models are generated. For example, using this strategy, the generation of 
bijective functions of order 6 only produces the eleven non isomorphic models 
whereas using the LNH heuristic produces 32 models. 



4.3 Step 2 

For every such generated Ig, the algorithm enumerates possible models for the 
theory using an almost standard CSP enumeration procedure, described as al- 
gorithm 1. We say that an index z has been hit by the algorithm when either 

— the value z has been assigned to a cell 

— the value of a cell h{ii, . . . in) has been chosen, or is currently under consid- 
eration, and z = ij for some j in {1, . . . rz}. 

We say that a cycle has been hit when one of its members has been hit. The 
algorithm keeps track of a high water mark called mdug for every cycle size s. 
The value mdus represents the end index of the highest cycle of size s that has 
been hit. Remember that cycles with equal sizes are consecutive. Let v G Digi 
we note mdn{v) the value mdn\cg\. By the proposition 1, when selecting a value 
V for a cell h{ii, ■ ■ - ik), only values smaller than mdn{v) + 1 need to be tried. 

After an interpretation of g has been computed, the index m identifying the 
start of the bijective restriction section is known, and thus all mdn values are 
set equal to max{0, m — 1). 

Traditional implementations of the LNH heuristic (as in [10]) use only one 
mdn value. They attempt to favor the heuristic application by selecting new 
cells so that the mdn does not change, as long as possible. To achieve this, it is 
enough to select cells h{ii, . . .ik) with indexes smaller than mdn. If impossible, 
a new cell is selected that yields the smallest possible change to mdn values. In 
practice, it suffices to choose h(zi, . . .ik) so that max{i\, . . .ik) is the smallest. 
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Algorithm 1 The enumeration procedure 

function XLNH(S': set of assign., F\ set of terminal axiom) iboolean; 

begin 

forall non propagated assignments a in S, Propagate(a, F, S) 
if S contains incompatible assignments then return(false) 
if F is empty then return(true) 
select unbound cell &(*i, ■■ - in) 

(so that max{ii, . . .i„) is the smallest) 
update mdn{ij) for all ij 
forall V G D s.t. v < mdn(v) + 1 

if XLNH(5' U 6 = v,F) then return(true) 
return false 

end 



This strategy however picks candidate cells in all existing functions (including 
functions that depend upon the value of others, which thus should not be taken 
in consideration here). 

Our approach focuses on the generation of an interpretation for all functions, 
one after another. The algorithm thus valuates all of a function cells before chang- 
ing the function. This prevents spreading the model generation over all function 
symbols. During the generation of a function, we use an extended version of the 
LNH, that treats as equivalent the individuals belonging to the same cycle, or 
to cycles of equal sizes, as long as these individuals have not been ’’hit”. 





Fig. 2. LNH versus XLNH 



The figure 2 illustrates the difference between the LNH and the XLNH heuris- 
tics. By using LNH, after a certain search tree depth is reached, all values 
are used, and yet are no more interchangeable (as illustrated in the leftmost 
diagram in figure 2). By using XLNH, after a canonical interpretation of the 
unary function g was computed, existing cycles in the bijective restriction let 
some individuals become interchangeable again (as in the rightmost diagram in 
figure 2). Hence the XLNH heuristic allows to first compute in a deterministic 
way the canonical models of a unary function, then statically exploit remaining 
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isomorphisms to prune the search. The implementation of the heuristic is thus 
entirely static, and requires no complex run time tests. The only tests performed 
are integer comparisons, at a null cost. 



Example 3. Figure 3 illustrates the generation of all models of order 5 abelian 
groups (cf. figure 1). Only three canonic interpretations of g are generated. In 
figure 3, bracket surrounded values are the ones suppressed from the search tree 
by the XLNH heuristic (by proposition 1). Missing branches are suppressed by 
constraint propagation. The □ symbol represents inconsistency. The symbol M 
represents the obtention of a model. 
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By looking at this diagram, we can observe that the first generated interpre- 
tation of g is the identity, all cycles having the same size zero. In that case, the 
rest of the search proceeds as with a LNH heuristic starting with mdn = 0. This 
results in suppressing many isomorphic interpretations. 
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5 Function Selection Strategy 

Experimental results show that the sequence of selected functional symbols im- 
pacts on computation times. It can be clearly understood why when we observe 
that some functional symbols never appear as top-level term labels in the ax- 
ioms (these functions are called ’’pure output”) while others never appear as 
sub-terms (they are ’’pure input” functions). It is obvious that pure input func- 
tions are uniquely determined when all other functions are known, and thus 
should not be explicitly enumerated. On the other hand, pure output functions 
should be generated in priority, because SEM’s propagation is essentially of a 
bottom up kind. 

The other functions appearing both as top-level terms and as sub-terms in 
the problem axioms are generated in a statically computed intuitive order, based 
on the number of times a function has a previously generated function as its 
input. This heuristic is computed statically (before search starts) by building 
a weighted directed graph from the problem axioms where nodes are labelled 
with functional symbols, and weighted arrows represent the number of recursive 
function invocation in axioms (/(g(. ..)...) eventually introduces the arrow from 
g to / or increments its weight by one). The existence of a static heuristic clearly 
improves program performance. 

Constants deserve a distinct treatment depending whether they appear in 
equations or disequations. In the former case (like of the additive inverse zero"), 
the constants are valuated before program starts, and result in introducing non 
interchangeable individuals (step zero of the algorithm). In the latter case, how- 
ever, (like in an axiom introducing a counter example for associativity), the 
constants are better valuated once the functions where they appear are entirely 
generated. 

Definition 5. Let T he an equational theory, F its set of functional symbols 
(without constants) and C its set of axioms. Let G = (X,W) be the weighted 
and oriented dependency graph defined as follows: 

— the set of vertices is isomorphic to F. 

— the edge {fi — >■ fj) belongs to W if there exists an axiom in C where fi 
appears as a sub-term of fj . 

— the weight of the edge {fi — >■ fj ) equals the number of times there exists an 
axiom in C where fi appears as a sub-term of fj. 

Example ). The following theory defines the axioms of an unit ring: g, a is the 
group and m is the multiplicative law. 

a(0, x) = X a{x, 0) = x 

a{g{x),x) = Q a{x,g{x)) = Q 

m{x, 1) = X w(l, x) = x 

a(x, a(y, z)) = a(a(x, y),z) m{x, m{y, z)) = m(m(x, y),z) 

a{m{x, y),m{x, z)) = m(x, a(y, z)) a(m(x, z),m(y, z)) = m(a(x, y),z) 

This set of axioms yields the following dependency graph: 
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9 




4 



Note: the weighted graph construction is not semantical, and depends upon 
the theory syntax. Hence, two different formulations of the same problem could 
have different graphs. This issue is field of future research. 



5.1 Function Selection Algorithm 

The best experimental results are achieved by selecting the next function to 
instantiate according to the following preference order: 

1. select a constant appearing in an equation (step 0) 

2. select a pure output bijective unary function (step 1) 

3. select a pure output unary function (step 1) 

4. select a pure output n-ary function (step 2) 

5. select a function that with maximal sum of weights of arrows coming from 
already generated functions (step 2) 

6. select a constant appearing in already fully generated functions (step 2) 

Note that all vertices having only input edges functionally depend upon the 
other functional symbols, and are uniquely determined as soon as the other func- 
tions have been entirely generated. Pure input functions are thus never generated 
explicitly. 

Example 5. As an example, let us consider the theory RNGO 4 I-I from the TPTP 
problem collection [7]. We have the following dependency graph: 



9 h 




4 



According to the previous selection strategy, we first select the constants 
which appear in equality equation, after g (pure input unary bijective), then h 
(pure input unary). Then, the choice of a before m is guided by the dependencies. 
Hence the instantiation order is: g, h, a, m. 

The table 1 illustrates those concerns by comparing the results obtained with 
the TPTP problem RNGO 4 I-I at order 6 using different function orderings. 
This example suggests a few comments: the best node complexity is achieved by 
selecting function a (the additive law) just after its inverse g. However, the best 
execution times are obtained with our function selection strategy, because many 
nodes (cell value choices for h in that case) trigger very immediate fails. 
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Table 1. Comparing different orders for the RNGO 4 I-I problem 





g, a, h, m 


g,h,a,m. 


g, m, a, h 


Time 


0.13 


0.06 


0.15 


Nodes 


89 


879 


634 


Models 


0 


0 


0 



Example 6. Let us consider the theory RNG025-8 from the TPTP problem col- 
lection [7]. We have the following dependency graph: 




In this case, the statically computed ordering is the following one: 
g, a, ass, m, com. Three constants appear in the set of clauses as disequations 
literals, and are thus interpreted last. 

Again, table 2 list the results obtained with different function orderings. The 
size of models is equal to 5. 



Table 2. Comparing different orders for the RNG025-8 problem 





p, a, as5, m, com 


a, m, ass, com, 


p, ass, a, m, com 


p, m, a, ass, com 


Time 


0.14 


0.33 


>10 minutes 


0.78 


Nodes 


3 149 


1 280 


- 


6 398 


Models 


3 072 


1 280 


- 


2 048 



The choice of g first is obvious, because it is pure input, unary bijective. com 
needs not be generated because it is pure output. Choosing a after g is natural 
because it has g as input (and com is discarded). Then ass should be preferred 
over m because the axioms involve six occurrences of a as a sub-term of ass, 
instead of only four in m. Again, you may observe the existence of a node count 
per execution time trade off. As before, the best execution times are obtained 
at the expense of more choice nodes, which suggests that many choices lead to 
very quick failures in the best option. 

6 Experimentation 

We have implemented the heuristic XLNH on the SEM software, as a heuristic 
variant. We thus use exactly the same data structures and propagation algo- 
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rithm. Our implementation currently only handles problems involving a unary 
bijective function, which are numerous. SEM’s source code is available at the 
web address www.cs.uiowa.edu/~hzhang/sem.html. All results are obtained 
on a K6II 400Mhz with 128 Mb of RAM. We limit to two hours the maximum 
time to solve a problem. 

We compare SEM + XLNH and classical SEM (+ LNH) on different math- 
ematical problems: abelian groups first, then several ring problems from the 
TPTP collection [7]. We have tested XLNH on a large number of these prob- 
lem instances and obtained very good results, and list them for three problems 
RNGO 4 I-I (rating 0.22 -very easy-), RNG025-8 (rating 0.67 -medium-) and 
RNG030-6 (rating 1 -difficult-). 



Table 3. Abelian Groups 





SEM -b XLNH 


SEM -b LNH 


Order 


Models 


Time 


Nodes 


Models 


Time 


Nodes 


32 


529 


168 


9 769 


2 295 


956 


421 178 


33 


15 


28 


2 769 


15 


1 151 


466 883 


34 


2 


239 


26 077 


20 


1 402 


481 249 


35 


13 


41 


3 477 


13 


1 700 


490 606 


36 


321 


375 


27 975 


2 142 


2 345 


872 374 


37 


1 


65 


4 107 


1 


2 848 


921 379 


38 


2 


532 


39 789 


22 


3 525 


935 527 


39 


17 


86 


4 350 


17 


4 263 


946 669 


40 


282 


816 


39 130 


2 220 


5 632 


1 393 433 


41 


1 


116 


5 163 


-b 


-b 


-b 


42 


42 


1 247 


54 822 


-b 


-b 


-b 


43 


1 


154 


5 396 


-b 


-b 


-b 


44 


31 


1 788 


58 137 


-b 


-b 


-b 


45 


180 


226 


6 122 


-b 


-b 


-b 


46 


2 


2 481 


60 281 


-b 


-b 


-b 


47 


1 


361 


14 096 


-b 


-b 


-b 


48 


3 345 


4 446 


75 905 


-b 


-b 


-b 


49 


8 


492 


22 976 


-b 


-b 


-b 


50 


22 


6 375 


232 718 


-b 


-b 


-b 


51 


21 


636 


25 053 


-b 


-b 


-b 


52 


-b 


-b 


-b 


-b 


-b 


-b 



We generate all models of the abelian group and only solve the satisfiability 
problem on the TPTP’s instances. The table 3 shows the comparison between 
LNH and XLNH for abelian groups. Our approach is 7 times faster for even 
orders and 70 times faster for odd orders. Our method shows that odd order 
abelian group generation is much easier than that of even order abelian groups. 
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This agrees with known results about the number of non isomorphic finite abelian 
group instances (it is known that every subgroup order divides the order of the 
group). This result shows the efficiency of XLNH. The results table 3 stops at 
order 52. However we can generate all odd order abelian groups up to order 63 
(278 models obtained in 2470 seconds and 40 764 nodes). We can observe that 
XLNH always produces fewer models than LNH. 



Table 4. Some TPTP problems 







1 XLNH 1 


1 LNH 


Problem 


Size 


Model 


Time 


Nodes 


Model 


Time 


Nodes 


RNGO 4 I-I 


8 


0 


0.05 


1 236 


0 


24 


175 672 




9 


0 


0.06 


1 673 


0 


123 


736 625 




10 


0 


0.09 


2 049 


0 


632 


3 061 678 




11 


0 


0.09 


2 497 


0 


2 928 


12 221 898 




12 


0 


0.16 


2 880 


- 


- 


- 




14 


0 


0.22 


3 725 


- 


- 


- 




16 


0 


0.4 


4 584 


- 


- 


- 


RNG025-8 


8 


1 


19 


872 609 


1 


135 


6 965 608 




9 


1 


2.8 


86 629 


1 


12 


92 396 




10 


1 


1.6 


10 435 


1 


41 


19 276 




11 


1 


2.5 


14 889 


1 


221 


67 835 




12 


1 


8.3 


148 203 


1 


529 


244 126 




13 


1 


5.4 


28 901 


1 


6 462 


985 767 




14 


1 


11 


39 383 


- 


- 


- 




15 


1 


9.5 


51 683 


- 


- 


- 




16 


- 


- 


- 


- 


- 


- 


RNG030-6 


11 


0 


0.76 


1 465 


0 


216 


576 977 




12 


0 


28.7 


83 007 


- 


- 


- 




13 


0 


1.5 


2 379 


- 


- 


- 




14 


0 


3.5 


6 250 










15 


0 


17 


32 987 










16 


0 


6 826 


4 039 087 










17 


0 


5 


5 338 










18 


0 


278 


400 055 










19 


0 


8 


7 375 









Table 4 lists the results obtained for several TPTP problems, sorted by 
increasing difficulty, according to the TPTP collection difficulty ratings. The 
XLNH heuristic proves very efficient for the RNG class instance (rings with 
added axioms). The RNGO 4 I-I problem, an easy one, is solved in less than 1 
second at order 16. The table stops here because of paper size limitations. How- 
ever XLNH solves RNGO 4 I-I at order 30 in 3.5 seconds and 10 989 nodes. The 
program is limited by memory. The RNG025-8 problem shows a difficulty peak 
at orders being a power of two (order 8). This is expected, as in the case of 
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abelian groups, because there exist many finite groups of order a power of two. 
XLNH fails at order 16 but solves the problem at order 17 easily (25 seconds). 
The RNG030-6, a currently open instance of the TPTP collection (rating 1), 
exhibits comparable behavior at power of two orders. Here again, because the 
problem involves many functions, the limitation comes from processor memory 
rather than combinatorial complexity. 

7 Conclusion 

We demonstrate the efficiency of exploiting the underlying structure of equa- 
tional theories to generate their finite models. The existence of a pure output 
unary function, specially if it is bijective, allows to avoid exploring many isomor- 
phic subspaces. Our results generalize the least number heuristic in situations 
when a unary function exists in the theory. This generalization proves very effi- 
cient in reducing the number of search nodes explored by the enumerator, and 
thus produces a lot fewer isomorphic solutions. 

Themes of future work include: the extension to non bijective unary func- 
tions, integration of dynamic symmetry tests, changes in the data structures 
to consume less memory and obtain solutions at higher orders, exhaustive gen- 
eration of canonical solutions for finite abelian groups or other theories up to 
unprecedented orders, the adaptation of these results to practical CSP problems 
involving a functional subpart (and eventually a bijection like in the Travelling 
Sales Person problem). 
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Abstract. This paper reports recent experimental work in the devel- 
opment and rehnement of the first order theorem prover Scott-5. This 
is descended from the Scott (Semantically Constrained Otter) prover 
(see Proc. IJCAI 1993, pp. 109-114) and uses the same combination of 
a saturation-based theorem prover and a finite domain constraint solver, 
but the architecture of Scott-5 is radically different from that of its an- 
cestor. Here we briefly outline semantic guidance as it occurs in Scott-5, 
and give experimental evidence of an improvement in performance (in 
terms of efficiency) that we attribute to the guidance strategy. 



1 Introduction 

Question: What semantically oriented strategy can direct a reasoning program 
in its choice of clauses to which to apply the inference rule(s) being employed: 
what properties can be used other than the current criterion of simply Ending 
a clause containing a literal with the appropriately signed predicate? 

L. Wos [8] (problem #5) 

In [8] Wos identifies “inadequate focus” as one of the primary obstacles to ef- 
fective theorem proving. In this paper we report one line of attack on the focus 
problem for saturation methods of first order theorem proving, by injecting se- 
mantic information into heuristics for ordering the possible inferences. Prelimi- 
nary work on this idea^ resulted in the system Scott [1,5,7] which showed some 
modest performance gains relative to its parent Otter. However, the main tech- 
nique used in that prover was model resolution; the work on false preference (see 
below) remained unsystematic and lacked a theoretical basis. The new Scott 
rests on a new understanding of semantic guidance and shows much more stable 
behaviour over a wide range of problems. We present results on the TPTP prob- 
lems and performance under fair conditions in CASC as compelling evidence 
that the effects exploited by our technique are real and useful. 

Preliminary versions of the present work were presented in the workshops 
FTP-2000 and Reunion Workshop on Implementations on Logic (at LPAR-2000) 
but as these had no published proceedings, no account of the new Scott has 
yet been published. 

^ Ours is not, of course, the only approach represented in the literature. Semantically 
based refinements of resolution go back at least to Slagle’s early work [4]. Mention 
should also be made of Plaisted’s semantic hyper-linking [3] which was developed 
independently of Scott and also uses information from models in first order proof 
search. 
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2 Semantic Guidance 

At any point during a saturation-based proof search, let S' be a consistent subset 
of the clauses derived so far and let c be a particular clause inconsistent with S. 
Then there are derivations of the empty clause from SU{c}, and c occurs in every 
such derivation, including the shortest ones and those which contain no irrelevant 
excursions. Therefore if the next clause chosen as parent for inferences is c, at 
least one proof (and possibly many) will be extended by it. If we knew which 
were the maximal consistent subsets of the clauses so far derived, we could choose 
given clauses from the complements of these sets, thus guaranteeing at every step 
that some proof fragment is extended. Naturally, this would not guarantee that 
the proofs extended at a given point were globally the shortest ones, nor that one 
of them would be the first proof eventually found: screening out excess proofs is 
a different issue, which we do not claim to have addressed. 

Unfortunately, we do not know which sets are maximal consistent, for con- 
sistency is famously undecidable. However, that does not mean we can never 
detect it: for instance, a formula may happen to have a model in a domain of 
just three elements, in which case it may be rather easy to find that model and 
so establish consistency. We hope to show that a search for (small) finite models 
can usefully approximate a satisfiability oracle. By relying on it, we give our 
prover access to Near-Maximal Consistent Sets (NMCS), which may be a little 
less than maximal, but which are still consistent and which can still focus the 
search for the most part on proofs rather than on useless sequences of inferences. 

Evidently, there is a tradeoff between time spent in the semantic component, 
searching for models, and time spent in the syntactic component, searching for 
proofs. Investing time in modelling tends to improve the quality of semantic 
guidance and therefore to increase the efficiency of our proof search,^ but on the 
other hand our goal is a proof not a model, so at some point we have to stop 
modelling and risk making some inferences. The tradeoff between the quality 
of guidance and its cost has three aspects. Firstly, the model generator must 
be forced to terminate by means of a bound on its search; the more generous 
this bound the closer the NMCS approximate MCS, but generosity costs time so 
the bound must be set judiciously. We find that sensible bounds can usually be 
set dynamically during the search using information from previous attempts at 
modelling clauses. Secondly, the probability that a given model can be improved 
within the search bound falls to near zero at some point, whereupon we can 
stop trying to improve that model and treat the set of clauses true in it as 
sufficiently near maximal to count as the NMCS for guidance purposes. Thirdly, 
it is important to bound the number of NMCS maintained. The more NMCS the 
system maintains the better its coverage of the search space, but each NMCS 
incurs severe overheads. 

It may not be possible to resolve all these tradeoffs in a uniform way. The 
present program SCOTT-5 uses limits derived experimentally over three years 
of development work with successive versions of the algorithm. 

^ I.e. the proportion of given clause selections that actually occur in the final proof. 
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3 Implementation 

We assume familiarity with the given clause algorithm used by Scott and 
by most contemporary high-performance theorem provers^ As suggested above, 
Scott’s technique is to select given clauses almost always from a collection 
of co-NMCS (complements of near-maximal consistent sets). To identify these 
NMCS and witness their consistency, it uses models generated at need by a fi- 
nite domain constraint solver [6] which thus functions as a semi-oracle yielding 
no false positives but an unknown proportion of false negatives. The underlying 
theorem prover is Otter. 

Where S is any consistent set, clearly every derivation of the empty clause 
contains at least one complete branch of formulae, right up to an input formula as 
a leaf, which is disjoint from S. Hence, if the intersection of some co-NMCS with 
the passive set is small, it makes sense to take all the clauses in that intersection 
as the next few given clauses, so that we have activated as much as possible of 
at least one branch of every proof. We call this the “semantic queue” strategy. 
The meaning of “small” for this purpose is determined pragmatically; it seems 
from our observations that, roughly speaking, single-digit numbers are “small” . 

In order to enforce fairness, and because there is no easy way to know whether 
a given NMCS is a good one to use for guidance, Scott does not rely exclusively 
on the semantic queue strategy. Where all the co-NMCS are “large”, it lets the 
choice of given clause cycle through them. Within each co-NMCS, the clauses 
are ordered by weight and age as normal, so that the choice among them is 
fair. Moreover, because the cost of maintaining all NMCS would be prohibitive, 
and again to ensure complete fairness, it occasionally chooses a given clause from 
among those which are in all known NMCS. Thus Scott’s given clause selection 
strategy is fair if Otter’s is. 

In principle, generating a NMCS is straightforward: simply scan the list of 
clauses, adding a clause whenever the resultant set can be shown consistent by 
finding a model. By listing the clauses in different orders, different NMCS may 
be obtained. In Scott this process is dynamic, since the NMCS change as more 
clauses are deduced, and so the witnessing models may also be changed during 
the search. It has always been part of the concept of Scott that semantic guid- 
ance should be fitted to the particular proof search, being driven by the actual 
clauses deduced rather than being set up in advance. The algorithm for generat- 
ing models as a by-product of labelling the “kept” clauses is the same as that of 
the earlier Scott [1,5,7] and will not be repeated here for reasons of space. Nor 
will the algorithm and the refinements of it due to extensive experimentation be 
further elaborated. 



® See, for instance, page 5 of [2]. For historical reasons, Scott uses the “Otter 
loop”, in which certain simplification inferences are performed eagerly within the 
passive set, rather than the “Discount (or Waldmeister) loop” in which all such 
inference is done lazily. We do not see the difference between these two versions of 
the algorithm as very important to Scott. 
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Fig. 1. Comparison of Scott and Otter. Left plot shows time in seconds to solution. 
The problems are in each case those from TPTP that the prover solves within 600 
seconds, and the times are plotted in monotone increasing order. Right plot shows 
difference in efficiency ratings (number of clauses in proof over number of given clauses) 
between the two provers. Above the line: Scott more efficient; below the line: Otter 
more efficient. Problems are those solved in 600 seconds by both provers, omitting those 
solved by both with under 10 given clauses, in TPTP order. 



4 Performance 

Performance measures are inevitably rough, since Scott does not have a highly 
tuned autonomous mode because the algorithm is still under development and 
its parameters are so many and varied. Nonetheless we give comparisons against 
Otter (actually against Scott with its semantic features turned off) to il- 
lustrate the effects of the guidance strategy, and a summary of the fair and 
independent comparison against the best contemporary provers in CASC-17. 

The time plot shows Scott’s comparatively gentle decline in performance as 
the problems get harder. Among high-performance provers to date, only Setheo 
seems to show a similar performance profile. The scatter plot shows Scott 
emerging ahead of Otter in terms of search efficiency, but also illustrates that 
the effect is far from uniform across problems. For reasons of space, we omit a 
more detailed breakdown of the results. 

The above comparisons are exclusively with Otter. However, it is also inter- 
esting to compare the performance of Scott with other, more recent, theorem 
provers. To that end, we instance some results from CASC, in which Scott has 
competed for the last four years. In the general MIX division, its performance 
has been little better that of Otter, mainly because of the difficulty of devising 
an appropriate set of defaults for its many parameters. On the other hand, in the 
UEQ (unit equality) division, where problem sets are relatively homogeneous, 
it has performed rather well even in comparison with highly engineered provers 
such as E and Waldmeister. The full results of CASC-17 (July 2000) are avail- 
able on the CASC web page"* so here we reproduce only summaries of the MIX 
and UEQ results (Figures 2(a) and 2(b)). 

http : //www. cs . j cu. edu.au/~tptp/CASC 
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System 


Problems 

Solved 


Average 

Time 


E 0.6 


57 


79.31 


E-SETHEO 2000CSP 


57 


160.53 


Gandalf c-21 


55 


99.60 


Vampire 1.0 


45 


48.06 


Vampire 0.0 


37 


54.72 


SCOTT 5.0.0 


22 


178.06 


Bliksem 1.10 


18 


65.33 


Otter 3.1b 


8 


55.86 



(a) MIX division 



System 


Problems 

Solved 


Average 

Time 


Waldmeister 600 


30 


43.45 


SCOTT 5.0.0 


12 


186.27 


E 0.6 


8 


77.85 


E-SETHEO 2000CSP 


8 


190.18 


Vampire 1.0 


7 


89.06 


Otter 3.1b 


6 


33.42 


Gandalf c-2.1 


6 


100.80 


Bliksem 1.10 


4 


28.05 



(b) UEQ division 



Fig. 2. Summary of CASC-17 results 
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1 A Modified Clause-Diffusion Prover with Multi-search 

Peers-mcd.d implements contraction-based strategies for equational logic, mo- 
dulo associativity and commutativity, with paramodulation, simplification and 
functional subsumption. It is a new version of Peers-mcd [4], that parallelizes 
McCune’s prover EQP (version 0.9d), according to the Modified Clause-Diffusion 
methodology (http : / /www . cs . uiowa . edu/ ~bonacina/cd . html) . 

In parallel search with peer processes (no master-slave hierarchy), multiple 
deductive processes search the space of the theorem-proving problem, each deve- 
loping its own data base and derivation, and cooperate through communication 
of data, until one finds a proof and all halt. Within parallel search, we distin- 
guish between distributed search, where the searches generated by the processes 
are differentiated by subdividing the inferences among them, and multi-search, 
where they are differentiated by assigning different search plans to the processes. 
Most approaches to parallel search in theorem proving adopted either one or the 
other: for instance, the systems based on Team-Work and combination of ho- 
mogeneous provers emphasized multi-search, while the previous Clause-Diffusion 
provers emphasized distributed search (see [6] for a survey and references) . A ma- 
jor difference between Peers-mcd.d and all its predecessors is that Peers-mcd.d 
implements both distributed search and multi-search, and their combination. 
Peers-mcd.d can run in one of three modes: 

— Pure distributed-search mode: the search space is subdivided among the pro- 
cesses; all processes execute the same search plan. 

— Pure multi-search mode: the search space is not subdivided; every process 
executes a different search plan. 

— Plybrid mode: the search space is subdivided, and the processes execute dif- 
ferent search plans. 

The basic structure of the search plan in a Peers-mcd.d process is to se- 
lect premises for expansion (paramodulation), normalize the generated equa- 
tions {forward contraction), and apply them to normalize pre-existing equations 

* Supported in part by the National Science Foundation with grants CCR-97-01508 
and EIA-97-29807. 
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{backward contraction), in such a way to keep the data base inter-reduced. Peers- 
mcd.d offers three ways of diversifying the search plan: 

— Different premise selection mechanism, 

— Different ratio of breadth-first search and best-first search, and 

— Different heuristic function to sort equations for premise selection. 

Peers-mcd.d inherits from EQP two mechanisms to select the premises for 
paramodulation, the given-clause algorithm and the pair algorithm. The first 
one is a best-first search with the weight of equations as heuristic function: 
at every selection extract an equation of smallest weight and generate all its 
paramodulants with the already selected equations. The second one is a best- 
first search on pairs: at every selection extract a pair of equations of smallest 
weight and generate all their paramodulants. The most basic way of introducing 
multi-search is to have some processes execute the given-clause and some the pair 
algorithm: in Peers-mcd.d, when the flag diverse-sel is set, the even-numbered 
processes execute the pair algorithm, and the odd-numbered processes execute 
the given-clause algorithm. 

If the parameter pick-given-ratio has value x, the given-clause/pair al- 
gorithm picks the oldest, rather than lightest, equation/pair once every x -\- 1 
choices. The second way of diversifying search plans in Peers-mcd.d is to let each 
process use a different value of pick-given-ratio: when the flag diverse-pick 
is set, process pk resets its pick-given-ratio to x -\- k. 

The third ingredient to obtain different search plans is to let the processes do 
best-first search with different heuristic functions. The heuristic functions of [1,7] 
measure the syntactic similarity between an equation and the target theorem(s): 
the higher the similarity, the better the heuristic value, since an equation similar 
to the goal might reduce it. Peers-mcd.d implements the heuristic functions occ- 
nest, CP-in-goal^ and goal-in-CP of [7], except that it uses the measure mo 
of [1] for the number of occurrences of a function symbol in a term, to take 
into account that AC operators are varyadic, since terms under AC operators 
are flattened. When the fiag heuristic-search is set, process pk executes the 
given-clause algorithm with heuristic function occ-nest if k mod 3 = 0, CP-in- 
goal if k mod 3 = 1, and goal-in-CP if k mod 3 = 2. The pair algorithm does not 
use these heuristic functions, because they are defined for equations, not pairs. 

The search space is subdivided by subdividing the generated equations among 
the processes. This is achieved without a top-level scheduler, whenever a process 
generates and keeps an equation (i.e., the equation is not deleted by forward 
contraction), it gives it a process number, which becomes part of the equation’s 
identifier (see [5] for details). This induces a subdivision of inferences, because 
each process skips the steps that it knows are done by others based on the 
identifiers of the premises. All inferences that generate new clauses, including 
backward-contraction, are thus subdivided, while deletions are not. Each pro- 
cess broadcasts the equations it has generated and kept after normalization. In 
Peers-mcd.d, the parameter decide-owner-strat, that controls the choice of 

^ CP stands for critical pair, hence equation. 
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subdivision criterion, may also have the value no-subdivide, meaning that no 
subdivision occurs, and a process broadcasts an equation only if its weight (its 
heuristic value if heuristic-search is set) is lower than a given parameter. 

In summary, if decide-owner-strat = no-subdivide, and at least one of 
diverse-sel, diverse-pick and heuristic-search is set, Peers-mcd.d runs in 
pure multi-search mode; if decide-owner-strat ^ no-subdivide, and none of 
diverse-sel, diverse-pick and heuristic-search is set, Peers-mcd.d runs in 
pure distributed-search mode; if decide-owner-strat yf no-subdivide, and at 
least one of diverse-sel, diverse-pick and heuristic-search is set, Peers- 
mcd.d runs in hybrid mode. 

2 Proofs of the Moufang Identities without Cancellation 

The first automated proofs of the Moufang identities in alternative (i.e., non- 
associative) rings by a general-purpose prover were presented in [2]. They used 
AC-UKB, the inequality ordered- saturation inference rule (i.e., superposition of 
an un-orientable equation into a goal to generate a new goal which is kept only 
if its normal form is not greater or equal than an already existing inequality), in- 
ference rules that build the cancellation laws in [8], and the heuristic measures of 
[1] to sort equations and delete those whose heuristic value is worse than a given 
threshold. These problems are still used as benchmarks (e.g., [3]) and in com- 
petitions (e.g., [9]). The TPTP library presents them in different formulations: 
some differ from [2] in choice of axioms and/or conjecture; those that follow [2] 
include the cancellation laws as implications, so that they are not equational. In 
the experiments reported here, the problems were formulated as in [2], but with- 
out cancellation laws, since EQP and Peers-mcd.d do not implement the rules 
of [8], and they are purely equational provers which cannot handle implications. 

In the following tables, the first column tells the mode: D for pure distributed- 
search mode and H for hybrid mode. The second column tells the search plan, 
given-clause, or pair, or diverse, if diverse-sel was set. The h means that 
heuristic-search was used. The number at the front is the pick-given-ratio: 
X if it was X for all processes, xd, if diverse-pick was set and process pk used 
X -\- k, nothing, if pick-given-ratio was not used. The number in parenthesis 
is the value of max-weight, if deletion by weight was used. The times (expressed 
in sec) are average CPU times. For each search plan, five subdivision criteria 
were tried, and the best result (among the averages) was retained. “T” means 
time-out after 3600 sec. The workstations were HP B2000 or C360, with IG 
or 512M of memory, with EQPO.Od running on a B2000 with IG, and N-Peers 
(Peers-mcd.d with N processes) on N workstations, one per process. 

The first two problems, moufangl {Middle Alternative Law) and moufang2 
{Skew- Symmetry Relation of the Associator) , are too easy for parallelization: 
EQPO.Od proved them in 4 and 1 sec, respectively, using the pair algorithm with 
pick-given-ratio = 4. However, with the default search plan, namely given- 
clause algorithm and no pick-given-ratio, EQPO.Qd terminated abnormally^, 

^ Some constant in the AC-matching or AC-unification code of EQP was exceeded. 
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whereas 1-Peer did both problems in 1 sec, due to the heuristic function used by 
the given-clause algorithm. 

For the Left Moufang Identity (moufangS), EQPO.Qd could not find a proof 
with the default search plan, while Peers-mcd.d did, thanks to distributed search: 



Mode 


Search plan 


EQPO.Od 


1-Peer 


2-Peers 


4-Peers 


6-Peers 


8-Peers 


D 


given(32) 


T 


T 


598 


91 


187 


40 


H 


given-h(32) 


T 


415 


230 


57 


42 


9 


D 


pair(32) 


3,215 


3,277 


551 


109 


51 


83 


D 


4-pair(32) 


956 


1,068 


126 


38 


56 


58 


D 


2-pair(32) 


88 


130 


66 


39 


109 


25 


H 


2d-diverse-h(32) 


88 


147 


84 


75 


41 


25 



With heuristic-search on (second row), also 1-Peer found a proof, which 
shows the merit of the heuristic function, and all other times improved, up to 
a proof in only 9 sec with 8-Peers. EQP found a proof with the pair algorithm 
(third row), and the sequential time was reduced with pick-given-ratio = 
4 (fourth row), but Peers-mcd.d with more than one node sped-up with these 
search plans also, finding a proof in 38 sec with 4 processes. The best sequential 
time was obtained with pick-given-ratio = 2 (last two rows): with this value, 
the parallel prover behaved more smoothly in hybrid mode. 

For the Right Moufang Identity (moufang4), EQP found a proof only with 
the pair algorithm and pick-given-ratio = 4: 



Mode 


Search plan 


EQP0.9d 


1-Peer 


2-Peers 


4-Peers 


6-Peers 


8-Peers 


H 


given-h(32) 


T 


437 


268 


162 


100 


28 


D 


pair(32) 


T 


T 


865 


356 


161 


105 


H 


4d-diverse-h(32) 


1,558 


1,638 


75 


32 


27 


47 



The problem proved to be elusive for the default search plan, but with 
heuristic-search on (first row), Peers-mcd.d solved it, with run-time decreas- 
ing down to 28 sec for 8-Peers. With the pair algorithm and pick-given-ratio 
not set (second row), 1-Peer did like EQP, since the pair algorithm does not use 
the heuristic function, but the parallel prover succeeded. With the hybrid search 
plan 4d-diverse-h, Peers-mcd.d exhibited super-linear speed-up for all numbers 
of processes, with the best result for 4-Peers: the speed-up was 1, 558/32 = 48.68 
and the efficiency 48.68/4 = 12.17. 

For the Middle Moufang Identity (moufangS), EQP could not find a proof 
within 3,600 sec with the default search plan, and took 572 sec with the pair 
algorithm, while Peers-mcd.d was much faster: using the default search plan, but 
with heuristic-search on, hence in hybrid mode, 1-Peer found a proof in 16 
sec, 2-Peers took 9 sec and 4-Peers only 5 sec. 
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Problems moufangS, moufang4 and moufangS were tried also in pure multi- 
search mode, with the same search plans tried in hybrid mode, but no subdivision 
and equation broadcasting limited by heuristic value. Almost no speed-up was 
observed. Thus, distributed search did much better than multi-search on these 
problems, and the combination of the two did even better. This suggests that a 
key factor in parallel search, possibly even more basic than limiting communi- 
cation, is to differentiate the processes, so that they do not overlap and explore 
different parts of the search space. The statistics showed that a speed-up is typ- 
ically accompanied by a strong reduction in number of equations generated (for 
Peers-mcd.d, the sum of the equations generated by all peers), hinting that the 
subdivision was effective, and led the processes to generate different searches 
and different from the sequential one. 

Directions for future work include the development of a Modified Clause- 
Diffusion prover for first-order logic with equality, to allow application to a larger 
class of problems. 

Acknowledgements. Thanks to Bill McCune for EQPO.Qd, to my former stu- 
dent Javeed Chida, for implementing the heuristic functions in his master thesis, 
and to Gigina Carlucci Aiello of the Dipartimento di Informatica e Sistemistica, 
Universita di Roma “La Sapienza,” where part of this work was done. 
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1 A Manifesto for a Generic Tableau Prover 

The last years have seen a renewed interest in modal and description logics 
(MDLs) . Better algorithms, coding, and technology have led to effective systems 
based on tableau and constraint systems [6,7] to DPLL-based implementations 
[5], first order provers [8] and the inverse method [13]. PSPACE problems such 
as satisfiability are within reach for realistic instances [10] and potentially EX- 
PTIME problems stemming from real applications can also be solved [3,7]. 

However, the comparisons now held at the Description Logic workshops and 
at the TABLEAUX conferences have also shown a major problem: the emphasis 
on performance is so strong that most implementors have restricted their prover 
to few fixed logics, hacking logics and strategies in their systems. 

Yet, there are infinitely many MDLs and the choice of one logic over another 
is driven by modeling needs and computational constraints of one’s applications. 
A logic about actions and plans is likely to have different semantical and compu- 
tational properties from a logic about database schemata. Even with the same 
logic, different search strategies may be needed for different applications. 

If a user wants to use logics or even search strategies slightly different from 
those of the current systems, he must hack his own prover. “What if I use 
this constructor” , “What if I change the order of rules” experiments are almost 
impossible for somebody who is not the implementor of the system. 

To answer the needs of users wishing to experiment and model with different 
logics or strategies there is a need of a generic theorem prover for MDLs. A 
prover playing the same role as Isabelle [12] or PVS [11] for higher order logics, 
while being less complex. If the user is not the same person as the programmer 
of the prover, one needs (a) flexibility and portability of the implementation, 
(b) high-level languages for tableau rules and strategy definition, and (c) user- 
friendly interfaces. 

Lotrec is such a generic tableau prover. It aims at covering all logics having 
possible worlds semantics, in particular MDLs^. 

^ Behind Lotrec is the work on modal tableaux with back/forth rules [9], graphs [1,2], 
and its DL counterpart in [7]. Lotrec has been implemented by D. Fauthoux [4]. 
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Fig. 1. Lotrec presentation of a tableau for logic K4 



2 Architecture 

The aims of flexibility, portability, and nice interfaces had motivated the choice of 
Java as the implementation language. Within such an object-based programming 
language, Lotrec raises Java’s event-based architecture to a declarative approach. 

Tableaux are usually presented in tree form. In Lotrec, they are generalized 
to graphs in order to enable complex MDLs such as the ML of confluence, or 
MLs with complex interactions between knowledge and action. Graphs also allow 
to visualize possible worlds models (e.g. after transitive or symmetric closure of 
accessibility relations). Graph nodes are labelled by formulae, and edges by any 
term (possibly containing variables). Lotrec graphically presents the tableaux it 
has generated (Fig. 1), and allows for “drag-and-drop” restructuring of its shape 
by the user. (It remains to implement “drag-and-drop” interfaces for defining 
rules, strategies. . . As in Isabelle and PVS this is currently done via textual flies.) 



3 Defining the Language of Your Own Pet Logic. . . 

Before starting to define the rules, the user must define the logical connectives 
he wants to use. Let us take the definition for a logic of actions: 



connector 


f alsum 


0 


true 


"FALSUM" 


4 


connector 


and 


2 


true 




3 


connector 


not 


1 


true 


II ~ M 


5 


connector 


feasible 


2 


true 




4 


connector 


after 


2 


true 




4 



Gonsider e.g. the last definition: after is the internal name of the connective, 
and 2 is the number of its arguments. The rest of the parameters defines the 
graphical presentation: true means that the connective is associative, 
stipulates that the internal (after hit (feasible smash broken)) will be 
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rule "K" 



descriptor links nodeO 


nodel 


(variable R) 


descriptor hasElement 


nodeO 


(nec (variable R) (variable A) ) 


action add 

end 


nodel 


(variable A) 


rule "diamond" 


descriptor hasElement 


nodeO 


(not (nec (variable R) (variable A))) 


descriptor isNotMarked 


nodeO 


CONTAINED 


action newNode nodeO 


nodel 




action link nodeO 


nodel 


(variable R) 


action add 


nodel 


(not (variable A)) 



end 



rule "InclusionTest" 

descriptor isAncestor nodeO nodel 
descriptor contains nodeO nodel 
descriptor isNotMarked nodel CONTAINED 
action mark nodel CONTAINED 

end 



Fig. 2. Some possible rules for the modal logic K4 



written [hit] <smash>broken on the screen. 4 is the priority of the connective 
wrt the others. 

4 Defining the Semantic- Tablean Rules of Your Pet 
Logic. . . 

A rule consists of a descriptor and an action part. The former contains the 
applicability conditions, while the latter contains operations on tableaux. 

A tableau rule is interpreted as mapping a pattern to a pattern, where pat- 
terns are connected fragments of a given tableau: if the descriptor part matches 
the pattern then that pattern is replaced by the result of the action part. 

Consider the standard multi-modal logic AT4„, with modal operator nec. A 
handful of rules is in Fig. 2. The rule for handling formulae of the form nec A is 
the rule K. It says if some node nodeO of a Kripke structure is linked to a node 
nodel via the relation R and contains a formula of the form nec R A, then A is 
added to nodel. R and A must be variables in order to make the rule work as a 
schema. Constants are useful for specific formulae or relations. 

Lotrec also allows for manipulating expressions on links. Thus one may easily 
have logics like dynamic logics where links are labelled by complex programs. 

Sometimes the ordered application of rules via a strategy is not enough to 
ensure termination or completeness; or maybe a user just wants to test various 
strategies. Thus, nodes, links, and formulae in nodes can be marked. For instance. 
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for logics with transitive accessibility relations such as it'd, before creating a new 
node we may wish to check whether the current node is not included in some 
ancestor. The rules diamond and InclusionTest in Fig. 2 do that. 

We may want to do more than just simple propositional reasoning and we 
may have what are called concrete domains or quantitative domains. Then we 
allow for oracle calls to programs exterior to Lotrec. These programs typically 
are rewriting procedures, constraint solvers, SAT provers, etc. 

5 Defining Yonr Pet Search Strategies . . . 

After you have the rules defining the semantics of your logics you may want to say 
how to combine and apply them. Search Strategies do exactly that by mapping 
tableaux to tableaux (or sets thereof for disjunction-like rules) by repeatedly 
applying rules in some suitable ways. 

If a user has a set of tableau rules {rulel, rule2, . . . , ruleN} that has 
been proven to be complete for the logic under concern, then he can immediately 
implement a complete theorem prover for this logic via a fair strategy, which 
repeats applying all rules sequentially. Such a naive strategy is written: 

repeat allRules rulel; rule2; ... ; ruleN end end 

Here, to apply a rule means to apply the rule simultaneously to every possible 
pattern in the tableau. For our rule "K" this means simultaneous application to 
every formula of every node. 

Your pet logic may require more sophisticated strategies for termination, or 
completeness, soundness etc. or, again, you may just want to experiment. Thus 
we allow for search strategy programming with the following constructs: 

strategy ::= rule I 

repeat strategy end I 

allRules strategyl; strategy2; ... ; strategyN end I 
firstRule strategyl; strategy2; ... ; strategyN end 

We use firstRule rulel; rule2; ruleS when we want to apply the first ap- 
plicable rule, and we use allRule rulel ; rule2; ruleS to apply all applicable 
rules among rulel, rule2, ... in that order. For instance, if rulel and ruleS 
are the applicable rules then firstRule will only apply rulel, whereas allRule 
will apply first rulel and then ruleS to the result of the first rule. There is a 
close similarity with Isabelle “tacticals” FIRST and EVERY for combining tactics. 

In Fig. 3 we show an example of a correct but inefficient strategy for KA. 
With Lotrec it is easy to experiment and see what happens and what we save 
if we move the not and rule outside the innermost repeat (which is one ofthe 
improvements to makethe strategy more efficient). More efficient versions are 
available at the Lotrec webpage. 

At present, strategies are applied globally, to all nodes and formulae. We 
plan future refinements where users may wish to define orderings among nodes 
or formulae and strategies applying rules only to the first element in the order. 
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repeat firstRule 

"stop" ; 

// propositional rules 
repeat allRules 

"not not"; "and"; "not and" 

end 

end; 

// generate and check successors 
allRules 

"diamond"; "K" ; "4"; "InclusionTest" 

end 

end 

end 



Fig. 3. A possible strategy for the logic KA 



6 Great, Where Can I Find Lotrec ? 

Lotrec is available at http://www.irit.fr/ACTIVITES/LILaC/Lotrec 

One can also find there a library containing the standard modal logics such as 
K, KD, KT, KA, 54, KB, PDL, the modal logic of density, and several logics 
of knowledge and action, as well as intuitionistic logic. 



Acknowledgements. Thanks to the LILaC members who tested Lotrec and 
helped to improve it by their comments, in particular L. Aszalos, J.-F. Con- 
dottta, Th. Polacsek, and S. Suwanmanee. F. Massacci acknowledges the CNR 
Fellowship CNR-203-07-27. 
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Abstract. This paper introduces ModProf, a new theorem prover and 
model hnder for propositional and modal logic K. ModProf is based on 
labelled modal tableaux. Its novel feature is a sophisticated simplihcation 
algorithm using structural subsumption to detect redundancies. Further 
distinctive features are the use of syntactic branching, and an enhanced 
loop-checking algorithm using a cache of satisfiable worlds created in the 
course of the proof. Experimental results on problems of the TANCS 
2000 Theorem Prover comparison are presented. 



1 The ModProf Prover 

For the last few years, fast theorem proving in modal logics has been domi- 
nated by tableaux-based systems which employ semantic branching. However, 
recent research [5] suggests that syntactic branching may well be competitive, 
particularly in structured problems which often occur in real applications, pro- 
vided simplification strategies are employed. ModProf^ has been developed to 
support this claim. ModProf is a tableaux-based theorem prover which uses 
syntactic branching on disjunctions. It makes use of well-known optimization 
techniques, notably some of those employed in FaCT [3] and DLP [7], but in a 
novel, more general framework. Its simplification strategy consists of eliminating 
subsumptions, a technique borrowed from resolution-based theorem proving and 
adapted to nested K formulas. Formulas are stored in a compact form aiding 
the discovery of subsumptions. A dedicated cache holds previously expanded 
satisfiable worlds; the cache lookup also employs subsumption testing instead of 
identical matching. 



2 Architecture and Algorithm 

2.1 Data Structures 

Modal formulas are represented in ModProf as tuples, called templates, of 
the form w = (P,N,D,c), where P and N are (ordered) lists of propositional 
variables representing positive and negative literals, D is a, list of dual templates, 

^ ModProf is available online at http://www.cs.sfu.ca/~cl/projects/Modprof. 
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and c is a constraint template. Any two elements of D are sister templates of each 
other, and a template or dual template occurring inside w is called a subtemplate 
of w. We also define the inconsistent and the empty template, 0 and 0. 

Templates are interpreted as formulas via the following dual functions 

f (.Pi ^ Pk: 1 ) • ■ • ) 5 c) 

Pi A . . . Apk A -'Ui A ... A -in; A f~(di) A ... A f~(dn) A □ /(c) 

f~ (pi, . . . ,Pk,ni, . . . ,m,di, . . . ,dr,c) = 

-■pi V . . . V -ipfc V m V ... V n; V f(di) V ... V f(dn) V 0 /“(c) 

/(0) = T /-(0) = T /(0) = T /-(0) = T 

For example, /(((p), nil, nil, (nil, (q), nil, 0))) = p AD(->q), where nil denotes 
the empty list. Thanks to the K-equivalence D(pA q) DpA Dg, all the neces- 
sities in a conjunction can be gathered into one single constraint template. Due 
to the axiom □ T, we can set c = 0 for any formula without a necessity conjunct. 
Disjunctions and possibilities are represented implicitly using dual templates. For 
example, (nil, nil, (nil, nil, nil, (nil, (p), nil, 0)), 0) represents the singleton possibil- 
ity 0 p; we call it a world template, as a world satisfying p will be created when 
this template is expanded. The apparent storage overhead for world templates is 
compensated by the elimination of negation, disjunction and possibility, which 
reduces the number of fields in the tuple. 

Observe that a formula and its negation are represented by syntactically iden- 
tical templates, distinguished only by their depth within the template structure. 
This corresponds to the normalized form of FaCT [3], giving us the same ad- 
vantages as mentioned there. Additionally, we remark that transformation rules 
obeying the duality principle can be implemented efficiently. These include all 
simplification rules described below. Only in the model creation stage will the 
algorithm need to distinguish between conjunctions and disjunctions. 

Templates can be nested to arbitrary depth. Note that ModProf does not 
transform formulas into modal CNF or a similar form. Instead, modal formulas 
are first converted into negation normal form, from which the template form can 
be readily obtained; both steps take linear time. 

The knowledge base is just a conjunction of formulas and thus gets represented 
by a template, which we call the root template. (We also consider this a world 
template). Henceforth, we consider formulas and subformulas synonymous with 
the templates representing them. 



2.2 Subsumption and Simplification 

Prior to expanding templates into models, all input formulas to ModProf un- 
dergo a simplification process. At the heart of this process lies a subsumption 
detection algorithm [4] . (A formula F subsumes a formula G iff all models of G 
are models of F.) Subsumption checking is well-established in resolution-based 
theorem provers for formulas in CNF. Also, complete structural subsumption 
detection algorithms exist for weak description logics such as FC~ . The novel 
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algorithm of ModProf is also structural; while obviously not complete, it does 
cover all instances of the subsumption relation in the above two classes and 
extends further to disjunctions and modalities of arbitrary depth. 

Within the simplification process, any subformula F of the knowledge base 
is checked for subsumption against any formula G which holds in its scope. (In 
the propositional case, G holds in the scope of F exactly when F is a sister 
template, or a subtemplate of a sister template of G. The modal case is omitted 
here for brevity.) If any of these subsumption tests succeeds, F is redundant 
and gets erased. Empty templates arising from the simplification process exhibit 
local contradictions and lead to the parent template being erased. Components in 
unary templates, except for world templates, get “flattened out” into their parent 
template; since this operation changes their scope within the knowledge base, 
additional subsumption checks against the components of the parent template 
must be performed. This makes the simplification process recursive; termination 
is guaranteed, as the number of atoms in the knowledge base strictly decreases 
with every subsumption deletion. 

ModProf’s simplification algorithm can be seen as an extension of the clash 
detection used in DTP; it is also strictly more powerful than the simplification 
described in [5], thanks to the subsumption test replacing syntactic equivalence, 
and a pervasive application of the duality principle. It alone suffices to prove 
certain interesting problems such as all of the problem class k_lin_p in the 
Tableaux 1998 Theorem Prover Comparison without any branching. 

2.3 Model Creation, Heuristics, and Caching 

Upon user request, ModProf tests the simplified knowledge base for satisfia- 
bility by expanding the root template. The disjunctive nodes, i.e. all dual sub- 
templates of the root template with at least two components, are instantiated 
depth-first; each alternative instantiated from a dual template is checked for 
subsumption in all other components of the root template; this ensures that 
the knowledge base remains in simplified form. If an alternative fails, its nega- 
tion is added to the knowledge base as a lemma. Currently, ModProf employs 
chronological backtracking. However, a backjumping strategy compatible with 
the simplification algorithm can be devised. 

Once all disjunctions are expanded, the only remaining dual subtemplates are 
world templates. ModProf proceeds to expand them depth-first into accessible 
worlds (or shows them unsatisfiable) . Thus, an explicit Kripke model for the 
knowledge base is constructed recursively. ModProf reuses previous work and 
reduces the likelihood of exponential-space Kripke models by keeping a cache 
of all world templates that have been accessed on the current branch; with 
each successfully expanded template, a satisfying model is stored. Unlike most 
other theorem provers, a cache entry subsumed by a query template constitutes a 
match, in which case the query template is marked as satisfiable, and a link to the 
cached model is created. Note that this is done even if the cached model is still 
being constructed; this implements the conventional loop checking algorithm, 
necessary for termination in the presence of global axioms. 
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Fig. 1. ModProf performance on the problem class qbf-cnfSSS-K4 of TANCS 2000 



3 Implementation 

ModProf has been written and compiled under Allegro Common Lisp Version 
4.3. The source code consists of roughly 1,500 non-comment lines, distributed 
over several modules. The standard user interface provides standard commands 
to maintain a knowledge base of formulas, test its consistency and print models. 

The experiments shown in Fig. 1 report running times on a class of problems 
of the TANCS 2000 Theorem Prover comparison [6]. They were obtained from 
a compiled version of ModProf on a Sparc Ultra 10 with 384 MB of main 
memory. Each problem class contains 8 problems for which two running times, 
measured in seconds, are reported: the first one is the geometric mean over all 8 
instances, the second only over those found satisfiable. The number of satisfiable 
problems and those timed out is also given. 

4 Strengths and Weaknesses 

The experimental results indicate that ModProf performs much better on sat- 
isfiable problems than on unsatisfiable problems. This is due to the fact that 
satisfiable world templates are cached, whereas unsatisfiable ones are discarded. 
This, in conjunction with the inferior chronological backtracking, leads to a 
large amount of thrashing, which is also the reason why ModProf still lags 
behind the fastest currently existing theorem provers, FaCT and DLP. However, 
ModProf’s normalized performance is already comparable or better than that 
of two other tableau-based theorem provers, RACE [2] and *SAT [1], both of 
which are reported to make extensive use of caching, and one of which employs 
dependency-directed backtracking. (The comparison is based on data taken from 
[6].) All four systems use semantic branching or convert formulas to modal CNF. 

We conjecture that ModProf will perform better on structured formulas 
with repetitions of similar, but not necessarily equal, subformulas which can 
be simplified by the subsumption checker. These are the formulas which occur 
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more commonly in real-world knowledge bases. On flat, random formulas such as 
random fc-SAT, we expect ModProf to perform poorly; the asymptotic number 
O(n^) of initial subsumption checks (where n is the number of clauses) would 
be attained, but no simplification would arise if all clauses are guaranteed to be 
distinct and free of multiple atoms. 

5 Summary and Remaining Work 

At this stage, ModProf is only a research prototype. Many of its features, and 
definitely the code, have not been optimized for speed. Standard optimization 
techniques (caching, backjumping) [8] have not fully been implemented. Also, 
more extensive series of experiments will be necessary to make a final com- 
parative statement about the performance of ModProf. However, the current 
results suggest that a sophisticated simplification technique can drastically speed 
up tableau-based theorem proving. 

Our original goal was to address the question whether syntactic branching can 
compete in efficiency with semantic branching. The problems from the TANGS 
2000 comparison do not help here, as they are already in modal CNF. However, 
experiments on the Tableaux 1998 problem suite which features nested, struc- 
tured formulas, indicate no difference in relative performance to other systems 
in their 1998 implementations, which suggests that the answer is positive. 

The subsumption detection algorithm was shown to yield even greater im- 
provements on problems involving global axioms [4]; this is because the loop 
checking algorithm only requires an existing world subsumed by the current 
world in order to discover a periodicity and close the loop; other systems have 
to “wait” for an equivalent world template and thus incur a lot more branching. 
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Abstract. Previous methods for generating random modal formulae 
(for the multi-modal logic K(i„)) result either in flawed test sets or for- 
mulae that are too hard for current modal decision procedures and, also, 
unnatural. We present here a new system and generation methodology 
which results in unflawed test sets and more-natural formulae that are 
better suited for current decision procedures. 



Most empirical testing of decision procedures for propositional modal logics, 
usually for the multi-modal logic K(r„p employs randomly generated formulae. 
This style of testing was initially proposed by Giunchiglia et al [3] and later im- 
proved by them and also by Hustadt and Schmidt [6,1]. Other kinds of randomly 
generated formulae have been proposed by Massacci [7]. Randomly generated 
formulae have been used with all the recent, highly optimised modal decision 
procedures, including DLP [8], FaCT [4], KSatC [1], *SAT [2], and TA [6], and 
have been used in several comparisons of these systems [7]. 

The basic idea behind the random generator in [3,6,1] is to generate a number 
of clauses. Each clause has the same number of disjuncts. As most testing as been 
on clauses with three disjuncts, the generated formulae are often called 3CNFo^ 
formulae. Each disjunct in each clause is independently either a possibly negated 
propositional variable or a modal formula consisting of a possibly negated box 
over a clause. The embedded clauses are generated in the same way as the top- 
level clauses, except that at some maximum modal depth all the disjuncts are 
possibly negated propositional variables. Care is taken to not repeat disjuncts 
in a clause, nor to have complementary disjuncts in a clause. 

Testing consists of setting various parameters, usually the maximum modal 
depth, d, the probability that a disjunct is a propositional literal, p, and the 
number of propositional variables, N. Most tests use standard values for the 
other parameters, namely three disjuncts in a clause, C, and only one kind of 
modal box, m. Then several values are picked for the number of top-level clauses 
in a formula, L, and for each value for L many (usually about 100) formulae are 
generated and tested for satisfiability. 

This testing methodology as described above provides a decent test for cur- 
rent modal decision procedures, allowing good comparisons of their behaviour. 
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However, problems have remained with these methods, most notably that the 
generated formulae can contain pieces that make the entire formula easy to solve. 
Because even the fastest current modal decision procedures can only handle for- 
mulae with a few propositional variables, the presence of even a small number of 
top-level clauses with only propositional disjuncts can easily cover all the com- 
binations of the propositional literals and make the entire formula unsatisfiable. 
(We call such formulae trivially unsatisfiable.) 

Previous attempts to eliminate this trivial unsatisfiability have concentrated 
on eliminating top-level propositional disjuncts by setting p = 0 [6]. However, 
formulae with propositional literals showing up only at the deepest modal depth 
are extremely hard to solve for modal depths greater than 1. Moreover, the 
resulting formule are not very natural [5]. 

We have devised a new generation methodology for generating modal formu- 
lae for K(f„) that eliminates or reduces the problems with the previous generation 
methods. The first new idea of our approach, first suggested in [5], is to eliminate 
strictly-propositional clauses except at the maximum modal depth by requiring 
that the number of propositional literals in each clause be less than 1 away from 
the average value, while still maintaining the overall ratio. For three disjuncts 
per clause (C = 3) and propositional probability one-half (p = 0.5) this means 
that half the clauses have 2 propositional literals and half have 1. 

To show the benefit of this change, we present two experimental runs for 
maximum modal depth d = 2 and propositional probability p = 0.5. In the 
old method the time curves are dominated by a “half-dome” shape, whose steep 
side shows up where the number of trivially unsatisfiable formulae becomes large 
before the formulae become otherwise easy to solve, as shown in Figure 1 (left). In 
fact, nearly all the unsatisfiable formulae here are propositionally unsatisfiable. 
With our new method, as shown in Figure 1 (right), the formulae are much 
more difficult to solve than the old method, because there is no abrupt drop-off 
from propositional unsatisfiability, but they are much easier to solve than those 
generated with p = 0. Further, trivially unsatisfiable formulae do not appear at 
all in the interesting portion of the test sets. 

The second new idea of our approach is to allow the number of disjuncts in 
a clause C to vary in a manner similar to the number of propositional disjuncts. 
We then determine the number of propositional literals in each clause based on 
the number of disjuncts in that particular clause. 

We tested this generation methodology for several values of C. The results for 
C = 2.5 are given in Figure 2 (left). These formulae are much easier than those 
generated with C = 3, although they are still quite hard and form a reasonable 
source of testing data. Trivially unsatisfiable formulae appear in large numbers 
well after all the formulae are all unsatisfiable and relatively easy. By reducing 
the number of disjuncts we are thus able to create interesting and unflawed test 
sets for higher modal depths. 

The last new idea of our approach is that the generator allows (optionally) 
direct specification of the probability distribution of the number of propositional 
atoms in a clause, and allows the distribution to be different for each modal depth 
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C = 2.5, 2, iV = 3,...,6, 

p = 0.5 



C=[[l,8,l]], d = 3,4, iV = 3.4, 
p= [[[1,0], [0,3,0], [0,3, 3,0]]] 







Fig. 2. New method. Left column: m = 1, C = 2.5, d = 2, N = 3, ..., 6, and p — 0.5. 
Right column: m = 1, C = [[1, 8, 1]], d = 3, 4, A" = 3, 4, p = [[[1, 0], [0, 3, 0], [0, 3, 3, 0]]]. 



from the top level to d — 1. It also allows direct specification of the probability 
distribution for the number of literals in a clause at each modal depth. Thus, 
the probability distribution for the number of propositional atoms depends on 
both the modal depth and the number of literals in the clause. 

As an example we present a set of tests with m = 1, C = [[1,8, 1]], N = 
3,4, d = 3,4, and p = [[[1, 0], [0, 3, 0], [0, 3, 3, 0]]]; C represents the probability 
distribution for the number of literals in a clause, meaning “1/10, 8/10 and 1/10 
clauses are of length 1, 2, 3 respectively” ; p represents the probability distribution 
for the number of propositional atoms in a clause, meaning “1/1 and 0/1 1-literal 
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clauses have 0 and 1 propositional literal respectively; 0/3, 3/3 and 0/3 2-literal 
clauses have 0, 1 and 2 propositional literals respectively; 0/6, 3/6, 3/6 and 0/6 
3-literal clauses have 0, 1, 2 and 3 propositional literals respectively”. In this 
example, both distributions do not vary with the modal depth. This set of tests 
introduces a small fraction of single-literal clauses that contain a modal literal 
(except at the greatest modal depth, where they contain, of course, a single 
propositional literal). The results of tests are given in Figure 2 (right). Notice 
that we can generate test sets of interesting difficulty even with modal depth 4. 

Our generator also takes extreme care to not disturb the probablility dis- 
tributions of the generated cluses as the result of rejecting clauses that have 
repeated or contradictory disjuncts. It first selects a “shape” for the formula 
and only when this is determined does it select the rest of the formula. This care 
is necessary because otherwise larger disjuncts would be preferentially selected 
because they have less chance of a repetition or contradiction. 

Our generator is available for download at 

http : //www-db . research . bell-labs . com/user/pf ps/ dip. 

It can output formulae in various syntaxes and can be used to capture statistics 
from various modal decision procedures. 
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Abstract. Kapur and Subramaniam [8] defined syntactical classes of 
equations where inductive validity is decidable. Thus, their validity can 
be checked without any user interaction and hence, this allows an integra- 
tion of (a restricted form of) induction in fully automated reasoning tools 
such as model checkers. However, the results of [8] were only restricted 
to equations. This paper extends the classes of conjectures considered in 
[8] to a larger class of arbitrary quantifier-free formulas (e.g., conjectures 
also containing negation, conjunction, disjunction, etc.). 



1 Introduction 

Inductive theorem provers usually require massive manual intervention and they 
may waste huge amounts of time on proof attempts which fail due to the in- 
completeness of the prover. Therefore, induction has not yet been integrated in 
fully automated reasoning systems (i.e., model checkers) used for hardware and 
protocol verification, static and type analyses, byte-code verification, and proof- 
carrying codes. Most such push-button systems use a combination of decision 
procedures for theories such as Presburger arithmetic, propositional satisfiability, 
and data structures including bit vectors, arrays, and lists. However, extending 
these tools by the capability to perform induction proofs would be very desirable, 
since induction is frequently needed to reason about structured and parameter- 
ized circuits (e.g., n-bit adders or multipliers), the timing behavior of circuits 
with feedback loops, and code using loops and/or recursion. 

For that reason, Kapur and Subramaniam proposed an approach for inte- 
grating induction schemes suggested by terminating function definitions with 
decision procedures, and gave a syntactical characterization of a class of equa- 
tions where inductive validity is decidable using decision procedures and the 
cover set method for mechanizing induction [8,11]. For those equations, induc- 
tion proofs can be accomplished without any user interaction and they only fail if 
the conjecture is not valid. In Section 2, we give a simple characterization which 
extends the class of decidable equations in [8] . Subsequently, we further extend 
the approach to arbitrary quantifier-free formulas, i.e., we define classes of such 

* Supported by the Deutsche Forschungsgemeinschaft Grant GI 274/4-1 and the Na- 
tional Science Foundation Grants nos. GGR-9996150 and CDA-9503064. 
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formulas where inductive validity is decidable. The crucial concept for this char- 
acterization are so-called correctness predicates. For a quantifier-free conjecture 
ip, c<^ is a correctness predicate iff for any tuple of (constructor) ground terms 
q*, the truth of c,p{q*) implies the truth of (p[x*/q*] (cf. [6,9]). We present a 
technique for automatically generating correctness predicates in Section 3. 

The truth of a correctness predicate is only sufficient, but not necessary for 
the truth of the corresponding conjecture. In Section 4 we examine for which 
equations ip the correctness predicate is exact (i.e., the truth of Cip{q*) is both 
sufficient and necessary for the truth of ip\x* /q*]). We develop a characterization 
to recognize (a subclass of) these equations automatically. In Section 5 we show 
that the use of exact correctness predicates allows us to extend the decidable 
classes of inductive theorems from equations to arbitrary quantifier-free formulas. 

Our results are also useful for conventional inductive theorem provers since 
exact correctness predicates can be used to simplify the proof of conjectures like 
double(y) = y ^ y = 0 where inductive provers would fail otherwise. 

Even though the paper focuses on constructor systems and the decidable 
theory of quantifier-free formulas on free constructors, we believe the approach 
extends to other decidable theories T as well (e.g., Presburger arithmetic). 



2 Equations Where Inductive Validity Is Decidable 

We use term rewrite systems TZ (TRSs) as our programming language [1]. In a 
TRS, all root symbols of left-hand sides are called defined and all other function 
symbols of TZ are constructors. We only consider constructor systems (CSs), 
i.e., TRSs where the left-hand sides contain no defined symbols below the root 
position, even though most of the results in this paper generalize to more general 
theory-based systems, called T-based systems in [8], with a decidable theory T, 
in which arguments to defined symbols are terms from T. Moreover, we restrict 
ourselves to (ground-)convergent and sufficiently complete CSs TZ, i.e., for every 
ground term t there exists a unique constructor ground term q such that t — q. 
(A term containing only variables and constructors is called a constructor term; 
a constructor term without variables is a constructor ground term.) 

For induction proofs, we use the concept of cover sets [7,11]. A cover set is a 
finite set of pairs C = {(st, {tCj, . . . , J), . . . , (s)),, {C,i, • ■ • , where 

s* and t*j are n-tuples of terms (for some n > 0). C is complete if for every 
n-tuple q* of constructor ground terms, there is an s* and a substitution a such 
that s*a = q* . Every cover set C induces a relation <c on tuples of constructor 
ground terms: p* <c q* iff there exists a pair (s*, {t*^, . . . , G C such that 

s*a = q* and t*ja — p*. C is called well-founded iff <c is well founded.^ 

A quantifier-free formula (p is inductively valid (or “valid” for short), denoted 
“77. ^ind ip” 5 iff Vy* ip holds in the initial model of the equations of TZ (where y* 



^ <c is well founded if there exists no infinite sequence . . .ta <c <c ti <c A. 
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are the variables in For example, consider the following CS: 

half(O) — >■ 0, half(s(0)) — 0, half(s(s(x))) -T s(half(x)). 

This function definition suggests the cover set Chaif = {(0, 0), (s(0), 0), 
(s(s(a;)), {x})}. To prove ip by induction w.r.t. Chaif (using the induction vari- 
able y), one obtains the base formulas (fi[y/0] and V3[y/s(0)] and the step formula 
{p[y/x\ V5[y/s(s(a;))]. Here, ip[y/x] is the induction hypothesis and v?[y/s(s(x))] 

is the induction conclusion. When proving a conjecture p containing a term 
/(j/i, . . . , y„), a successful heuristic for the choice of an induction relation is to 
perform induction w.r.t. C/ using the induction variables yi, ■ ■ ■ ,yn, cf. [2,11]. 

Kapur and Subramaniam [8] characterized classes of equations where induc- 
tive validity is decidable (the decision procedure consists of an induction proof 
attempt w.r.t. a particular cover set). The observation is that if each induction 
formula built according to some cover set C only contains terms from an under- 
lying decidable theory, then validity of the original conjecture can be decided. 

Def. 1 and Thm. 2 apply to general T-based systems, but due to lack of 
space, we focus on the decidable quantifier-free theory of free constructors in 
this paper. Here, r[s*j abbreviates r[y*/s*] where y* contains all variables in r. 

Definition 1 (C-provability) . Let TZ be a convergent sufficiently complete CS 
and let C he a complete well-founded cover set. An equation ri = r 2 is C -provable 
w.r.t. TZ iff r 2 is a constructor term, for every (s*, {t* . . . , t*„}) G C, s* and all 

t* j are tuples of constructor terms, and there exists a constructor term context 
such that ri[s*j 

As an example, let us extend the CS for half by the rules double(O) ^ 0 and 
double(s(x)) — f s(s(double(x))). Then the equation double(half(y)) = y is Chaif- 
provable. As required, the term j/ is a constructor term. Moreover, we obtain 

ri[si] = double(half(0)) 0 and thus, Ci = 0, 

ri[s 2 ] = double(half(s(0))) ~ 0 and thus, C 2 = 0, 

ri[s 3 ] = double(half (s(s(x)))) — s(s(double(half(a;)))) and thus, Cs = s(s(D)). 

Since C-provability is decidable, Def. 1 characterizes a decidable class of con- 

jectures. Instead of checking C-provability directly, several sufficient conditions 
for C-provability were given in [8] . We obtain the following theorem. 

Theorem 2 (Decidability of inductive validity for equations). Let TZ be 

a convergent sufficiently complete CS, let C he a complete well-founded cover set, 
and let ri = T 2 be a C-provable equation. Then inductive validity of ri = V 2 is 
decidable (by attempting an induction proof w.r.t. C). 

Proof. The decision procedure works by constructing the formulas 

Ci[r2[tli\,...,r2[tlJ]=r2[s*] (1) 

^ TZ |=ind P means that for all constructor ground terms q* , p[y* /q*] follows from TZ’s 
equations and axioms stating that different constructor ground terms are not equal. 
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for all (s*, , f*„}) G C. As these equations only contain constructor 

terms, their validity is decidable. 

It turns out that ri = T 2 is valid iff all these equations are valid. For the 
“if”-direction, notice that (1) implies the induction formula 

gK,i] = r2[t*i] A...An[t*„] =r2[t*„] ^ n[s*] = r2[s*]. 

Thus, the validity of ri = T 2 follows by Noetherian induction. For the “only 



if” -direction, note that the validity of ri = T 2 implies the validity of (1). □ 

Since double(half(y)) = y is Chairprovable, the above decision procedure can 
determine its validity. It has to check the validity of the equations 

Ci[?’2[^i]] = ?'2[si], i-e., 0 = 0 , ( 2 ) 

C2[r2[t2]] = r2[s2], i.e .,0 = s( 0 ), ( 3 ) 

C'3[r2[t3]] = ?'2[s3], i-e., s(s(x)) = s(s(x)). ( 4 ) 



Since these equations only contain constructor terms, their validity is decidable. 
(Obviously, such an equation is valid iff both terms in the equation are syntacti- 
cally identical.) While (2) and (4) are valid, the second equation (3) is not valid 
and thus, the conjecture double(half(y)) = y is not valid either. 

Our aim is to extend the result of Thm. 2 to more general formulas (i.e., not 
just equations), provided that all equations in these formulas are C-provable. 
For example, we would like to consider formulas like double(half (j/)) = y ^ 
even(y) = true or double(y) = y ^ y = 0. Equations appearing in these formulas 
are neither valid nor unsatisfiable; consequently, there is a need to characterize 
the subset of instantiations for the variables for which these equations are true. 
For this extension, we need the notion of correctness predicates. 

3 Correctness Predicates 

We present a technique which automatically generates algorithms for so-called 
correctness predicates c^ for equations p. For any tuple of constructor ground 
terms q* , the truth of c^{q*) implies that <p[y* /q*] is valid. Our definition of 
correctness predicates is similar to the definitions of [6,9], but its form is quite 
restricted since we are interested in ensuring that validity of correctness predi- 
cates is decidable and that exact correctness predicates can be generated which 
completely characterize the domain of values on which the conjecture holds. 

We have seen that the proof of the conjecture double(half (j/)) = y can be 
attempted by induction w.r.t. the cover set Chaif. If J/ = 0 , the conjecture can 
be reduced to the equation (2) which is always true. In the case y = s( 0 ) we 
obtain the equation (3) which is always false. Finally, in the step case where 
y = s{s{x)), we have to prove that the induction hypothesis double(half(a;)) = x 
implies the induction conclusion double(half(s(s(a;)))) = s(s(x)). As shown in 
Section 2, double(half(s(s(a;)))) evaluates to s(s(double(half (x)))). Due to the 
induction hypothesis, we can replace the subterm double(half(x)) by x. Thus, 
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we obtain the equation (4) (which is always true). Hence, provided that the 
induction hypothesis is valid, the induction conclusion would also be valid. This 
gives rise to the following rules for the correctness predicate Cdoubie(haif(y))=y- 

t'double(half(y))— y (0) ^ true, (5) 

t'double(half(y))— y (^(0)) ^ false, (6) 

t'double(half(y))— y (^(^(^) ) ) ^ t^double(half(y))— y (^) • (^) 

Thus, we have synthesized the even algorithm. Note that the rule (7) is stronger 
than the following rule one would have gotten from the above analysis: 

^double(half (y))— y (^(^(^) ) ) ^ true if C(jouble(half (y))— y (^) ■ 

Since we want to generate unconditional rewrite rules for the definition of cor- 
rectness predicates and to synthesize a complete definition, we use the form (7). 
As a result, the correctness predicate so generated may not be exact, and hence, 
provides only a sufficient condition for the conjecture to be valid. 

In general, to prove a C-provable equation ri = r 2 w.r.t. a cover set C, 
for each pair (s*, {f* i, • ■ • , f*„. }) G C we must check whether the equation 
■ • ■ , = f 2 [sl] is valid, cf. Equation (1) in the proof of Thm. 

2. In order to obtain correctness predicates as simple as the ones above, we have 
to demand that these equations are either valid for all instantiations or for none. 
This ensures that the right-hand sides of the rules for correctness predicates only 
have the form true, false, or recursive calls of correctness predicates. 

Definition 3 (Radical equations). Let TZ be a convergent sufficiently com- 
plete CS and let C = {(«!, {fpi, ■ • ■ , J), • • • , (s^, {C,i, . . . , be a 

complete well-founded cover set. An equation r\ = r^ is radical under C iff 
r\ = V2 is a C-provable equation where ri[s*] — ® 
constructor term context Ci and for all 1 < i < m we have 

TZ hind Ci[r2[tli],. . . ,r 2 [f*„J]] = r 2 [s*] or 
TZ hind ~'di[r2[tii], . . . = ''’2 [Si ] • 

Note that since all Ci, s*, and t* are constructor terms, it is decidable whether 
a C-provable equation is radical. The reason is that one only has to check whether 
an equation between two constructor terms is valid or unsatisfiable. Obviously, 
such an equation is unsatisfiable iff the two terms are not unifiable. For instance, 
the equation double(half(y)) = y is radical under Chaif since the terms in the 
equations (2) - (4) are either identical or not unifiable. 

To ease the presentation, we will now restrict ourselves to cover sets where 
there is at most one induction hypothesis for every induction step case.^ Thus, 

® The definition of correctness predicates can be easily generalized to the case of mul- 
tiple induction hypotheses. In fact, correctness predicates can be defined for arbitrary 
equations, i.e., they do not have to be C-provable or radical as required in this paper. 
However, these requirements are necessary in order to generate exact correctness 
predicates c,^ for arbitrary conjectures ip, such that validity of is decidable. 
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we only consider cover sets with pairs (s*, where 0 < rij < 1. 

Then we obtain the following definition of correctness predicates. 

Definition 4 (Correctness Predicate). Let TZ, C, ri = T 2 be as in Def. 3 

where 0 < < 1 for all 1 < i < m and let ri = r 2 be radical under C. Then the 

correctness predicate Cn=r 2 under C is defined by the following rules: 



Cr^=T2 (Sj ) 



l=r2(s*) 



f true, if TZ ^i„d Ci = r 2 [s*] and Ui = 0, 

( false, ifTZ |=i„d ~^Ci = r 2 [s*] and Ui = 0, 

f Cri=r2(f*,l), hind C'ih2[t*l]] = ’’2 [s^] = 1) 

\ false, ifTZ |=ind “'Cih2[f*i]] = m = 1. 



(8) 

(9) 

(10) 

( 11 ) 



Thm. 5 proves that a correctness predicate indeed represents a sufficient, but 
not a necessary condition for the soundness of the corresponding equation. 

Theorem 5 (Correctness predicates are sufficient, but not necessary). 

Let TZ, C, ri = r 2 be as in Def. 4- Let Cn=r 2 be a correctness predicate for r\ = r 2 
under C and let TZ also contain the rules defining Cr,=r 2 - Then we have 



(a) TZ hind ^r\—r 2 ^y ) — true r± — r 2 - 

(b) Ln general, we have TZ hind ri = r 2 = 

Proof. 



Cr^=r 2 {v*) = true. 



(a) Let q* be a tuple of constructor ground terms such that TZ hind Cri=r 2 (<Z*) = 
true. We prove TZ hind ?’i[9*] = fi[q*] by induction w.r.t. <c- Due to the 
completeness of the cover set, there exists some (s*, {f*, • • • j C}) G C and 
some substitution <j such that q* = s* a and since ri = r 2 is C-provable (due 
to its radicality), we have TZ hind 

If n = 0, then we also have TZ hind C = r2[s*] and thus TZ hind = 

r2[s*]. If n = 1, we have TZ hind C'h2[fi]] = r-2[s*] and TZ hind Cn=r2(firr) = 
true. The induction hypothesis yields TZ hind Ti[t\a] = r2[t*cr]. From the 
validity of ri[s*] = C[n[tl]] and C[r2[tl]] = r2[s*], TZ hind ^’i[s*cr] = r2[s*cr]- 

(b) Consider the equation half(y) = s(0) and induction w.r.t. the cover set Chaif. 

In the base cases y = 0 and y = s(0) the resulting conjecture 0 = s(0) is 
unsatisfiable and in the step case, the induction conclusion half (s(s(x))) = 
s(0) can be evaluated to s(half(x)) = s(0). Applying the induction hypothesis 
half(x) = s(0) yields s(s(0)) = s(0) which is unsatisfiable. So the equation 
half(y) = s(0) is radical under Chaif and we obtain the rules Chaif(y)=s(o)(0) — ^ 
false, Chaif(y)=s(o)(s(0))^ false, and Chaif(j/)=s(o)(s(s(a;))) false. So Chaif(y)=s(o) 
is always false, but half(y) = s(0) holds for s^(0) and s^(0). □ 

In fact, a correctness predicate Ccp{q*) yields true iff the equation ip holds for 
both q* and for all arguments p* which are smaller than q* w.r.t. the induction 
relation induced by the cover set. For that reason, the correctness predicate 
Chaif(y)=s(o) returns false for the arguments s^(0) and s^(0) although the conjecture 
is true, since it is false for the smaller arguments 0 and s(0). 
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4 Conjectures with Exact Correctness Predicate 

In this section we characterize equations ri = r2 where the correctness predicate 
Cri=r2 is exact, i.e., for all q* , Cn=r2(<?*) is true iff TZ ^md Ti[q*] = ‘>'2[q*]- 
Exactness is ensured if in Def. 4, whenever Rule (10) is used, the induction 
conclusion ri[s*] = r2[s*] is equivalent to As we have seen in 

Sect. 3, Cri=r2(<?*) only returns true if ri = r2 is true for q* and for all p* smaller 
than q* w.r.t. the induction relation induced by the cover set. Thus, Cn=r2 is only 
exact if ri[g*] = r2[q*] implies the validity of ri[p*] = r2[p*] for all arguments 
p* <c q* ■ So Cr^=T2 only describes the exact set of instantiations where ri = T2 
is valid, if each induction conclusion implies all its induction hypotheses. 

Consider again the proof of double(half (y)) = y by induction w.r.t. Chaif. We 
obtain the induction conclusion double(half (s(s(x)))) = s(s(a:)) and the induction 
hypothesis double(half(a;)) = x. Indeed, this conjecture has the desired property 

TZ )=ind double(half(s(s(x)))) = s(s(x)) double(half(a;)) = x. (12) 

To see this, note that in the first base case where y = 0, the left-hand side 
double(half(0)) evaluates to 0, which is smaller than or equal to the right-hand 
side 0 (if terms are compared by the subterm relation, for example). Similarly, 
in the second base case where y = s(0), the left-hand side evaluates to 0, which 
is again smaller than or equal to the right-hand side s(0). In the step case, the 
left hand side of the induction conclusion can be evaluated to 

s(s(double(half (a;)))) = s(s(x)). 

This evaluated induction conclusion contains the induction hypothesis, since the 
underlined terms are the terms on both sides of the induction hypothesis. (This 
observation also forms the basis of the rippling technique [3].) Thus, when going 
from the induction hypothesis to the induction conclusion, both sides of the 
equation grow by the context s(s(D)). In other words, in the induction base cases 
the left-hand side is at most as great as the right-hand side and afterwards, the 
left-hand side always grows at most as much as the right-hand side. Thus, if one 
ever reaches an instantiation t where double(half (t)) = t is no longer true, then 
the reason is that double(half (t)) is smaller then t. But since double(half (y)) 
grows at most as fast as y, afterwards there can never be a number s >Chaif 
t where double(half(s)) = s is true again. Hence, if the induction hypothesis 
double(half(a;)) = x is false, then the induction conclusion double(half(s(s(x)))) = 
s(s(x)) is false as well (or, formulated as a contraposition, we have Property (12)). 

The observation above leads to a general criterion. For many C-provable 
equations ri = T2, one does not only have ri[s*] — for 
all G C, but also r2[s*] = A[r2[t*i], ■ • • , r2[t*„J] for some 

constructor ground contexts Ci and Di. 

In our example, r\ is double(half (y)) and r2 is the term y. For the first pair of 
the cover set Chaif, we have Ci = 0 and Hi = 0 and for the second pair we have 
C2 = 0 and D2 = s(0). For the third pair, we have ri[s3] = double(half (s(s(x)))). 
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which can be evaluated to s(s(double(half(a;)))) and as i = x, we obtain C3 = 
s(s(n)). Since r2[sj] = s(s(a;)), we also have £>3 = s(s(D)). 

So Ti grows by the context Ci and T2 grows by the context when going from 
the induction hypothesis = r2[t* i] to the induction conclusion ri[s*] = 

r2[s*]. Our aim is to ensure that whenever ri and r2 are no longer 7?.-equal for 
some instantiation, then they will never become equal again for arguments which 
are greater w.r.t. the induction relation induced by the cover set. A sufficient 
requirement for this is that the contexts Ci added around ri are always at 
most as big as the contexts Di added around V2- To compare these contexts 
one can use an arbitrary ordering ^ on constructor terms, i.e., any relation 
which is transitive and irreflexive. Moreover, we require ^ to be monotonic (i.e., 
s t implies /(. . . s . . .) ^ /(. . . t . . .) for all constructors /) and stable under 
substitutions (i.e., s ^ t implies sa -< ta). Then we only have to demand 

Ci[x*] < Di[x*] for all 1 < t < m. 

As usual, denotes the union of and “=” where “=” is syntactic equality. 

Note that one may use any well-established technique for the generation of 
well-founded orderings such as the subterm ordering or the recursive path order- 
ing <rpo (cf. e.g. [5,10]) to synthesize a suitable ordering ^ satisfying the above 
constraints. Moreover, since ^ only has to be irreflexive, but not necessarily well 
founded, one can also use any ordering > which results from the reversal of such 
a well-founded ordering < (e.g., the superterm ordering or >rpo)- 

In our example we need a well-founded monotonic stable ordering ^ where 

Cl = 0 ^ 0 = Di, 

C2 = 0 ^ s(0) = D2, 

C3[a;] = s{s{x)) ^ s(s(a;)) = D^lx]. 

Such an ordering can easily found by standard techniques for automated termi- 
nation proofs. For example, the constraints are satisfied by the subterm ordering. 
Thus, one can automatically determine that double(half (y)) = y is a conjecture 
whose correctness predicate is exact. As Cdoubie(haif(i/))=y is only true for even 
numbers, we have shown that indeed this conjecture is false for all odd ones. 

In general, if ri = r2 is an equation and C is a cover set such that the 
above conditions are satisfied by some ordering then we say that ri = T2 
maintains -< under the cover set C w.r.t. the underlying CS TZ. The reason is 
that the relation ^ between ri and r2 is indeed maintained when going from 
an induction hypothesis to an induction conclusion. By using established (and 
decidable classes of) well-founded orderings -< from the area of term rewrite sys- 
tems one immediately obtains a syntactical sufficient condition for maintenance 
of orderings, which can easily be checked automatically. 

Definition 6 (Maintenance of orderings). LetTZ be a convergent sujficiently 
complete CS and let C = {(sj, {tfp, ■ • ■ , J), . ■ • , (s^, {C.i> • ■ • ) be 

a complete well-founded cover set (where 0 < < 1 for all \ < i < m). Let 
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ri = T2 he C-provable and let Ci and Di be eonstructor ground eontexts where 

^i[s*] C^[rl[tl^], . . ■ ,ri[t*„J] and 

r2[s*] = A[r2[i*i],...,r2[i*„,]]. 

Let -< be a monotonic ordering on constructor terms which is stable under sub- 
stitutions. We say ri = r2 maintains ^ under the cover set C w.r.t. TZ iff 
Ci[x*] ^ Di[x*] for all 1 < i < m. 

The following lemma proves that for equations which maintain an ordering, 
each induction conclusion indeed implies its induction hypothesis. 

Lemma 7 (Equations where the reverse induction formulas hold). Let 

TZ, C, < he as in Def. 6 and let r\ = r2 maintain -< under C w.r.t. TZ. Then for 
alll <i <m with m = 1 , TZ [=ind ?'i[s*] = T2[s*] ^ ri[t*f\ = r2[t*i]. 

Proof. We first show that for all constructor ground terms q*, we have 

ri[q*]in^r 2 [q*]. (13) 

The proof of (13) is done by induction w.r.t. <q. Due to the completeness of 
C, there must be a pair (s*, {t*^, . . . , tf^.}) G C such that s*a = q* . If Ui = 0, 
then we have ri[q*]in = ri[s*a]in = Ci < Di = r2[s*cr] = r2[q*]. 

Otherwise, if Ui = 1, we have n[q*]lTz = ri[s*cr]iTz= Ci[ri[tl^a] in] d: 
Ci[r2[tli<j\] by the induction hypothesis and monotonicity and stability of 
Furthermore, Ci[r2[t*io]] d A[?’2[i* w]] = f2[s*a] = r2[q*]. So (13) is proved. 

Now we can prove Lemma 7. Let cr substitute all variables of s* by constructor 
ground terms such that TZ |=ind fi[s*(T] = r2[s*a\. We assume that TZ ^i„d 
By (13) we must have ri[t*io]ind and since the 

7^-normal forms of ri[t*.^a] and r2[t* la] are different by assumption this in fact 
implies ri[t* i<j\in -< ?'2[i* w]. Since ^ is monotonic and stable we have 

ri[s*a]in= C\[ri[t*.i^a]in] -< C'i[r2[t*icr]] ^ A[r2[t*icr]] = r2[s-CT]. 

But this contradicts TZ |=ind fi[s*a] = r2[s*a] by the irreflexivity of □ 

Now we prove that if r\ = r2 maintains an ordering, then Cn=r2 is indeed exact. 

Theorem 8 (Equations where the correctness predicate is exact). Let 

TZ, C, < he as in Def. 6 and let r\ = r2 be an equation which is radical and main- 
tains some ordering < under C w.r.t. TZ. Moreover, let Cr^=r2 be a correctness 
predicate for ri = V2 under C and let TZ also contain the rules defining Cr,^=r2- 
Then TZ |=ind J'l = ?"2 ^ Cn=r2(y*) = true.^ 

A more general version of this theorem can be proved in which a conjecture does 
not have to be radical, and further, it is not necessary for the induction scheme of a 
cover set to have at most one induction hypothesis in every subgoal. 




478 J. Giesl and D. Kapur 



Proof. Due to Thm. 5 (a) we only have to prove TZ ^ind Ti[q*] = r2[q*] ^ 
Cri=r2(9*) = true for all constructor ground term tuples q*. Again, we use in- 
duction on <c- Let TZ |=ind 

By the completeness of C, there exists some {s* , , tjj}) € C and some 

substitution cr such that q* = s*a. If n = 0, then we have the rule Cn=r2(s*) 
true since the rule Cn=r2 (s*) fsise would only be generated if TZ |=ind ~'?"i [s*] = 
r2[s*]. This implies TZ |=ind Cr^=r2{(f) = true. 

Otherwise, if n = 1, by Lemma 7 the truth of ri[s*a\ = r2[s*cr] implies 
TZ |=ind 1^] = ^2[t*iO’]. So TZ ^ind Cri=r2 (t*,!®") by the induction hypothesis. 
By the rule Cr^=r2(s*) — >■ Cr^=r2(ti)> we obtain TZ ^ind Cn=r2 (■s*o’) = true. □ 

Let us consider the counterexample of Thm. 5 (b) again. When trying to 
prove half(y) = s(0), we obtain Ci = 0, Z?i = s(0) and C2 = 0, D2 = s(0). 
In the step case, the left-hand side half (s(s(a;))) evaluates to s(half(x)), i.e., we 
have C3 = s(n), whereas D3 = □. There does not exist an ordering ^ such that 
Ci[x*\ :< Di[x*\ for all i, since Ci ^ Di would imply 0 ^ s(0) and Cap] ^ D3[0] 
would imply s(0) ^ 0 which contradicts the transitivity and irreflexivity of 
Thus, half(y) = s(0) does not maintain any ordering under Chaif and indeed, its 
correctness predicate is not exact as shown in Thm. 5 (b). 

The above analysis of exactness of correctness predicates can be useful for 
fixing faulty conjectures, an objective for which correctness predicates were intro- 
duced by Protzen [9] . Since an exact correctness predicate precisely characterizes 
all instantiations on which the faulty conjecture is true, it can be used to fix the 
faulty conjecture into the “strongest theorem” possible. 



5 Conjectures Where Inductive Validity Is Decidable 



Now we extend Thm. 2 from equations to arbitrary quantifier-free formulas (p. We 
require that all equations ri = V2 occurring in p are radical and maintain some 
ordering under the same cover set C.^ Then by Thm. 8 their correctness predi- 
cates Cri=r 2 Etre sound and exact. For example, half(t/) = 0 is radical and main- 
tains the superterm ordering under Chaif- We obtain the correctness predicate 

Chaif(y)=o(0) ^ true, Chaif(y)=o(s(0)) true, Chaif(y)=o(s(s(x))) ^ false. 

The last rule is due to the fact that the instantiated left-hand side half(s(s(a:))) 
evaluates to s(half(a:)) and the replacement of the subterm half(x) according to 
the induction hypothesis yields the equation s(0) = 0 which is unsatisfiable. 

® Different equations in a conjecture may have to be proved using different cover 
sets; these cover sets can often be combined into a single cover set to generate a 
single induction scheme using merging and instantiation (cf. [2,7]). Further, it is 
not necessary for different equations to maintain the same monotonic ordering. For 
instance, in the running example of this section two different orderings are used in 
a conjecture. 
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Given a correctness predicate we can generate by replacing the result 
true by false and the result false by true whereas right-hand sides of the form 
c^{t*) are replaced by c^p{t*). In the above example this yields 

f^-haif(y)=o(0) ^ false, c^haif(y)=o(s(0)) ^ false, c^haif(y)=o(s(s(x))) ^ true. 

This correctness predicate is sound and exact for the conjecture -ihalf(t/) = 0. 

As stated before, exact correctness predicates can also be generated for non- 
radical equations, as well as for equations whose validity is decided using induc- 
tion schemes with multiple induction hypotheses. Thus, inductive validity of a 
much larger class of literals (equations and negated equations) can be decided 
using arbitrary well-founded complete cover sets without the requirement of rad- 
icality. The restrictions to radical equations and to induction schemes involving 
at most one induction step in every subgoal are needed only for the decidability 
of conjunctions and disjunctions of conjectures as discussed below. 

Given and c^,^, a straightforward idea to obtain rules for is ^s 

follows: If we have the rule Cip^{s*) — >■ false for some i G {1,2}, then we also 
obtain the rule c<^iAv 2 (s*) — f false. If we have the rules Cp^(s*) — true for 
both i G {1,2}, then we obtain Finally, if we have the rule 

Cvi(s*) and either Cp^(s*) Cp^(i*) or Cpj(s*) true (for i,j G 

{1,2}, i yf j), then we also obtain the rule Cp^p^p^i-s*) —>■ Cp^p^p^it*). But as the 
following example illustrates, such a simplistic construction does not work. 

Recall the rules (5) - (7) for Cdoubie(haif(y))=i/- We would obtain the following 
correctness predicate for the formula (p : double(half (j/)) = y A -■half (y) = 0. 

Cp{0) -G false, c,^(s(0)) false, c,^(s(s(a;))) Cp(x). 

However, this correctness predicate is not exact, since it is always false, 
whereas ip is true for all even numbers greater than 0. Even worse, the resulting 
correctness predicate for the negated conjecture ->(p would not even be sound 
(since it would always be true whereas -u/? is false for 0 and all odd numbers). 

The problem with the above construction of Cp-^p^p^ is the case where one 
rule Cp^{s*) — >■ Cpj^{t*) leads to a recursive call, but the other has the form 
true. If we use the rule Cp^p,p^{s*) -G Cp^p,p^{t*), then we may lose the 
exactness of the correctness predicate, since it could be that Cp^{t*) -G* false. 

To avoid this problem, we will now construct so-called basic correctness pred- 
icates (denoted ^^^ra) where for recursive pairs (s*,{f*}) G C we always have 
recursive rules br^=r 2 (s*) -A br^=r 2 (t*), but never a rule with the result false. 

Fortunately, if r\ = C 2 is radical and maintains an ordering under C, one can 
easily obtain a basic correctness predicate by simply extending the cover set C in 
an appropriate way. For that purpose we have to restrict ourselves to cover sets 
where for any two recursive pairs (s*, {t*}), (s}, {t}}) GC with i ^ j, the terms 
t* and s* do not unify (after renaming their variables). In other words, the argu- 
ments t* in an induction hypothesis must not unify with the arguments s* in any 
other induction conclusion. The cover set Chaif = {(0, 0), (s(0), 0), (s(s(a;)), {x})} 
trivially satisfies this condition, since there is only one recursive pair. The moti- 
vation for this restriction is that for all chains <c qt <c ■ ■ ■ <C it ensures 
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Cip{qX) = . . . = C(^(< 7 i). So a change in the value of c<^ can only occur in the last 
value Qq, which corresponds to a base case (i.e., we might have c^{q*) ^ c<^(g5)). 
Our aim is to extend C to a cover set C where gj is already a base case. Then 
for all chains q\ <c> ■ ■ ■ <c qti we have c^iq^) = ■ • ■ = c^p{ql) and thus, we can 
indeed use the rule c^{s*') — >■ for all recursive pairs of C . 

The idea for the extension of cover sets is simply to unify the terms t* of 
the induction hypotheses with the (variable-renamed) terms s* in the left com- 
ponents of all pairs from C. Let Hij be the respective mgu’s. Then every pair 
(s*, {t*}) is replaced by the new non-recursive pairs 0) for j ^ i and the 

instantiated recursive pair For Chaif we obtain 

Chaif = {(0, 0 ), (s(0), 0 ), (s(s(0)), 0 ), (s(s(s(0))), 0 ), (s(s(s(s(a;)))), {s(s(a:))})}. 



Definition 9 (Extending cover sets). Let C = 

i, . . . ® cover set with 0 < < 1, such that if rii = rij = 1 

and i ^ j then there do not exist substitutions fiij with = s*v^ij for a 

variable renaming v. Then the extended cover set C is defined as follows: 

C'={{s*,0)\ni = O} 

U{(s*/iij, 0) \ui = 1, Uj = 0, fiij = mgu{t*i, s*iy) for a variable renaming v} 
U{(s*/iiy, {t* i/iiy}) \ni = = mguft* i, s*v) for a variable renaming v}. 



Obviously, if C is complete and well founded, then the extension C is complete 
and well founded, too. Moreover, if an equation r\ = r 2 is radical and maintains 
an ordering under C, then it is also radical and maintains the same ordering under 
the extension C . In this case we can construct the basic correctness predicate 
by taking the extension C and by using the results true and false in its non- 
recursive cases and by using the rule &n=r 2 (s*) &n=r 2 (f*) for all recursive 

pairs (s*, {t*}). Note that only one such extension step for cover sets C is already 
enough: If a correctness predicate b has a non-recursive rule b(s*) — >■ true or 
b{s*) — >■ false for a recursive pair {s* , {f*}) G C, then a single extension step of C 
suffices to get recursive rules b{s*') — >■ b{t*') for all recursive pairs {s*' , of 

the extended cover set C' . In our example we obtain 



&haif(i/)=o(0) true, 
bhaif(s;)=o(s(0)) -s- true, 
^'haif(s/)=o(s^(0)) false, 
^haif(i/)=o(s^(0)) — >■ false, 

fchalf(i/)=o(s^(®)) &half(i/)=o(s^(a:)). 



^double(half(y)) — y (0) t tfUe, 

^double(half(y))— (s(0)) t false, 

^double(half(y)) — y (s (fl)) ^ tfUe, 

^double(half(y))— y (s (fl)) t false, 

^double(half (y))— y (s (a:)) 

^ ^double(half (y))— y (s^(x)). 



Now indeed basic correctness predicates for conjunctions are constructed by 
using the result false if one of the conjuncts yields false and true if both conjuncts 
yield true. If one (and therefore, both) conjuncts have a recursive call, then the 
basic correctness predicate for the conjunction has a recursive call, too. So if tp 




Decidable Classes of Inductive Theorems 



481 



is again the formula double(half (t/)) = y A -ihalf (t/) = 0, then we have 



^^haif(y)=o(0) false, 

&-haif(y)=o(s(0)) ^ false, 
&-haif(j/)=o(s^(0)) true, 

&-haif(s/)=o(s^(0)) ^ true, 

^-'half(y)— o(^ (^)) ^ ^-ihalf(y)— 0 (^))’ 



bip{0) -> false, 
6,^(s(0)) — ^ false, 
6<^(s^(0)) -T true, 
bip(s^(0)) false, 
b^(s^(x)) b^(s^(x)). 



Definition 10 (Basic Correctness Predicates). LetTZ be a convergent suf- 
ficiently complete CS and let C he a complete well-founded cover set such that 
for all (s*, {t*, . . . , t* }) G C, we have 0 < n < 1, and for two different pairs 
{s*, {f*}), (s* , {t* }) G C, there does not exist a substitution p, with t* p = s* vp 
for a variable renaming v. Let ip he a quantifier-free formula such that all equa- 
tions in ip are radical and maintain some ordering under C w.r.t. TZ. 

LetC = the extension 

of C and let ri[s*] — Ci[ri[t*-^], . . . ,ri[t*^J\ for a constructor ground context 
Ci- Then the basic correctness predicate b^ under C is defined by the following 
rules (analogous rules are used for formulas containing V, =>, 

{ true, ifTZ\=ind Ci = T 2 [s*] and Ui = Q, 

false, ifTZ |=ind “■C* = r 2 [s*j and Ui = 0, 

&n=r2 ) */ tli = 1, 

{ true, if we have the rule bcp'{s*) — )• false, 

false, if we have the rule bip'{s*) — !> true, 

if we have the rule b,p'(s*) — >■ 

{ true, if bip^{s*) — >■ true and b^^{s*) -T true, 

false, 'ifb^iis*) false or b^^(s*) -)■ false, 

ifKiist) -A and b^^{s*) b^^ftf^). 

Now we can present the main theorem which shows that the inductive validity 
of arbitrary quantifier-free conjectures is decidable, if all their equations are 
radical and maintain an ordering under C. The decision procedure works by 
constructing the basic correctness predicate and by checking whether it always 
yields true. The reason for the soundness of this approach is that basic correctness 
predicates are indeed sound and exact. 

Theorem 11 (Decidability of inductive validity for arbitrary conjec- 
tures). Let TZ, C, ip he as in Def. 10. Then inductive validity of ip is decidable 
(by checking whether all non-recursive rules of b^p have the right-hand side true, 
where bp is the basic correctness predicate for ip under C ). 

Proof. We have to show that bp is sound and exact, i.e., TZ |=ind T ^ bp{y*) = 
true if TZ also contains the rules defining bp. We use an induction w.r.t. the 
structure of ip. First let ip be an equation ri = r 2 . 

Let q* be a tuple of constructor ground terms. We prove TZ |=ind ?'i[9*] = 
’<’ 2 [q*] ^ 6ri=r2('7*) = true by induction w.r.t. <c'. Since C is complete and 
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well founded, obviously its extension C is complete and well founded, too. Due 
to the completeness of C , there exists some (s*, {tj, . . . , t* }) G C and some 
substitution a such that q* = s*a. If n = 0, then the claim follows from radicality 
of ri = T2 under C and thus, under C as well. 

If n = 1 and TZ [=ind = T2[s*a] then by Lemma 7 we also have 

TZ ^ind ’’iK' 7] = r2[t\a] since ri = V2 maintains an ordering under C and thus, 
under C as well. The induction hypothesis yields TZ |=ind br^=r2{ti'^) = true and 
thus, TZ ^ind bri=r2(s*a) — true as well. 

Finally, let n = 1 and TZ |=i„d -Ti[s*cr] = r2[s*a]. We have to show that this 
implies TZ [=ind = r2[t*cr]. Then the induction hypothesis would yield 

TZ |=ind ^ri=r2(t*o') = false and thus, TZ |=ind bn=r2{s*o-) = false as well. 

Note that s* = s*'^ and tl = t \' ^jl for some {s*\ {fi^}) G C by the definition 
of extensions. Moreover, by the requirement that arguments of induction 
hypotheses may not unify with arguments of other induction conclusions we 
also have that t\ = = s*'v^ by the definition of extensions. Since ri = T2 

maintains an ordering under C we have for ^ constructor 

ground context C^. As ri[s*] — this means that C- = Ci or, in 
other words, Radicality of ri = r2 under C implies that 

TZ l=ind Ci[r2[t\']] = r2[s*'] or TZ f=i„d -^Ci[r2[t\']] = r2[s*']. 

First assume TZ |=ind Ci[r2[t*i]] = r2[s*']. This implies TZ |=ind {Ci[r2[ti]] = 
r2[s*'])/i, i.e., TZ hind C,[r2[ti]] = r2[s*]. If we had TZ hind “■dKo’] = r2[tla] 
(i.e., TZ hind = r2[fi])crr for some r), then we would also have TZ hind 

{Q[ri[tt]] = r2[s*])crr. Since ri[s*] Q[ri[tl]], this implies TZ hind (ri[s*] = 
r2[s*])<JT in contradiction to the prerequisite TZ hind ~'ri[s*a] = r2[s*a]. 

Thus, TZ hind -^Ci[r2[t]']] = r2[s*']. Again assume TZ hind = r2[tl])aT 

for some r. Since t\ar = s*'vgL(jT, we have TZ hind (ri[s*^] = r 2 [s*'])vgL< 7 T and 
since ri = T2 maintains an ordering under C, this implies TZ hind (ri[fh] = 
r2[fi ])vgLaT by Lemma 7. By the prerequisite, this yields TZ hind {~'Gi[ri\t\ ]] = 
r2[s*'])i'ir(JT. However since ri[s*'] — this is equivalent to TZ hind 
(-Ti[s*^] = r2[s*'])vgLaT , which contradicts the assumption (as t\oT = s*'viiaT). 

For formulas which are no equations, the claim immediately follows from the 
(outer) induction hypothesis. □ 

Note that the conditions in Thm. 11 (i.e., radicality and maintenance of or- 
derings) can be checked automatically (by using orderings from the area of term 
rewrite systems which are amenable to automation). The set of all conjectures (f> 
satisfying these conditions forms a class where inductive validity is decidable. To 
decide inductive validity of (f> one simply constructs the rules for the basic cor- 
rectness predicate h^p (which can be done automatically) and one checks whether 
there is no rule of the form hp{. . .) — >■ false. 

So for a formula like double(y) = y => y = 0, one first checks whether 
this formula belongs to the class where inductive validity is decidable. For that 
purpose, one examines whether the conjecture contains a subterm f(y*) for 
pairwise disjoint variables y* and an algorithm / and then one checks whether 
all equations in the conjecture are radical and maintain an ordering under C/ 
(using the induction variables y*). 
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In our example, the equations double(y) = y and y = 0 indeed are both 
radical and they maintain the superterm ordering under Cdoubie- So inductive 
validity of this conjecture is decidable. The decision procedure constructs the 
basic correctness predicate 

^double(y)— 0 (^) ^ true, 

^double(y)=y^y— 0 (^(^) ) ^ true, 

^double(y)— 0 (^(^(^) ) ) ^ ^double(y)— y^y— 0 (^(^) ) ? 

and checks whether all non-recursive rules of 6doubie(y)=y^y=o have true on their 
right-hand side, which is obviously the case. Thus, the formula is valid. 

Note that in this way we can decide the inductive validity of conjectures 
which were up to now hard problems for inductive theorem provers. In fact, 
virtually all existing inductive provers fail in verifying double(7/) = y ^ y = 0.® 
The reason is that the induction conclusion double(s(a;)) = s(a:) s(x) = 0 can 
be evaluated to -is(double(a:)) = x, but there is no way to apply the induction 
hypothesis double(a;) = x ^ x = 0 and thus, the proof of the induction step case 
does not succeed. On the other hand, by our decision procedure, validity of such 
conjectures can be shown without using any inductive theorem prover at all. 

6 Conclusion 

We presented a class of conjectures where inductive validity is decidable (by a 
very simple decision procedure). This allows an integration of inductive reasoning 
within fully automated tools like model checkers or compilers. First, we extended 
the results of [8] to a larger class of equations and subsequently, we extended 
the approach further to arbitrary quantifier-free conjectures. The main idea is 
to build correctness predicates for all equations occurring in a conjecture and we 
gave a criterion for checking whether these correctness predicates really describe 
the exact set of objects where the equation is valid. We showed how to construct 
(basic) correctness predicates for non-atomic formulas and by checking their 
defining rules, the inductive validity of such formulas can easily be decided. 

We have used correctness predicates Cn=r 2 to describe the instances where 
an equation ri = X 2 is valid. However, in order to combine the correctness pred- 
icates Cri=T 2 and Cr'=r' of two different equations (e.g., when building their 
conjunction), we have to restrict ourselves to basic correctness predicates and 
moreover, Crj^=r 2 and Cr' =r' must have been built w.r.t. “compatible” cover sets. 
In order to avoid these difficulties, an interesting alternative approach is to rep- 
resent the set of instances where equations are valid by tree automata [4] instead 
of correctness predicates. As long as these sets of instances are regular, this 
indeed results in a very elegant method for deciding inductive validity (since 
regular languages are effectively closed under complement and intersection and 
since their emptiness is decidable) . However, in general there are many equations 
where the set of instances which makes them valid is not regular. For example. 
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the equation plus(minus(a;, y), minus(j/, a;)) = 0 is valid iff x and y are equal. A 
correctness predicate describing this set can easily be constructed automatically, 
whereas this set is not regular and therefore cannot be described by (ordinary) 
tree automata. This indicates that the use of tree automata may be too restric- 
tive compared to the use of (basic) correctness predicates. However, we intend to 
study the possibilities of using automata for deciding inductive validity further 
in future work. 

In this paper, we focused on integrating induction schemes with a decision 
procedure for the quantifier-free theory of free constructors to obtain an exten- 
sion of the decision procedure to quantifier-free formulas whose proofs (or dis- 
proofs) may require the use of induction. Kapur and Subramaniam [8] discussed 
an approach for integrating induction schemes into decidable quantifier-free the- 
ories including Presburger arithmetic, and they gave a decision procedure for 
inductive validity of a large class of equations involving T-based function sym- 
bols, where T is a decidable quantifier-free theory. In future work, we intend to 
generalize the techniques developed in this paper from constructor systems to 
T-based systems (including Presburger arithmetic) as well. 
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Abstract. We propose the notion of rewriting modules in order to pro- 
vide a strnctural and hierarchical approach of TRS. We define then rela- 
tive dependeney pairs built npon these modnles which allow us to perform 
termination proofs incrementally. Important results can be expressed in 
that new framework (regarding C^-termination for instance), and with 
help of 7T extendable orderings, we give effective new incremental methods 
for proving termination particularly suited for automation. 



1 Introduction 

Rewriting is used for specification, programming or in automated proofs. Yet, 
if structuring is a paradigm of ‘clean’ programming, a TRS is still considered 
in practice as a single set of rules and termination proofs are run on the whole 
system without taking any benefit of its possible modular structure. 

Programs are (should be) developed in an incremental way: one defines some 
basic types or functions and then builds other functions which use the basic ones 
and so on. In other words, some functions are somehow created ‘upon’ previously 
defined ones. A hierarchical structure over functions naturally emerges from that 
incremental procedure. Since termination is a difficult issue — especially when 
it comes to automation of proofs — it would be useful to perform automated 
termination proofs incrementally, that is to use normalization information about 
‘basic’ rules to show that adding ‘new’ ones will preserve termination. 

Unfortunately, the termination property does not behave as well as we could 
expect when dealing with unions of TRS. As shown by Toyama [16] if two TRS 
are strongly normalizing, their union does not necessarily terminate, even if 
the two systems do not share any symbol. The significant work of Gramlich 
showed that projections were to blame and gave sufficient conditions for ensuring 
termination of unions of TRS possibly sharing constructors [8] . But with regard 
to hierarchical unions, that is unions of TRS sharing function symbols, fewer 
results are known. Dershowitz proposed [6] conditions over a certain kind of 
hierarchical unions — so called constructor based unions — but those conditions 
are too restrictive in practice and not very suited for automation. 
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Until 1997, the usual way of automatically proving termination of a TRS was 
to show that the relation described by that system was included in a reduction 
ordering, that is showing that each rule was decreasing w.r.t. the strict part of 
the ordering. Since that ordering was well-founded, termination of the system 
followed. However, constraints hence induced required search for orderings that 
computers had difficulties to perform (if any suitable reduction ordering ever 
existed) and proofs where more likely run ‘by hand’ with ad hoc techniques. 

In 1997 Arts & Giesl introduced the notion of dependency pairs [3, 
1], weakening constraints, and thus replacing the previously needed reduction 
ordering by a weak reduction ordering, more suited to automated search. More- 
over, dependency pairs allow to prove termination of non-simply terminating 
TRS, that is of systems whose rules may be non-structural recursive like for 
instance /(x, ...)—>■ f{g{x), . . .) where g is not a constructor. The dependency 
pair approach consists of a structural analysis of rules, requiring for a suitable 
ordering that rules belong to its weak part and that only the so-called depen- 
dency pairs strictly decrease. A consistent approach in respect of our aim of 
modularity would then be to refine that structural analysis of rules depending 
on where the relevant rules lie within the hierarchy. 

Thus, in order to give a general framework bringing modular structure of 
TRS to the fore, and so as to provide automated methods to prove termination 
incrementally, we will firstly recall some generalities, then in Sec. 3 we shall in- 
troduce rewrite modules, express unions of TRS in terms of modules and explain 
how to show out TRS hierarchical structure. Sec. 4 will deal with general ter- 
mination difficulties in unions and will provide orderings with good properties 
with respect to projection (the so-called tt extendable orderings) . The framework 
of modules settled, relative dependency pairs will be defined in Sec. 5 and that 
powerful tool will lead us to Sec. 6 and thus to new criteria for modular termi- 
nation (Thm 1 and Thm 2). Those results provide an incremental proof method 
illustrated with the complete example of Sec. 7. We shall then discuss in Sec. 8 
how this work compares to others, especially to results of Arts & Giesl [2] and 
Dershowitz [6] . We will eventually conclude and give ideas on how we intend to 
apply and extend our framework. 



2 Preliminaries 

We recall usual notions about rewriting [7], and give our notations. A signature 
iF is a finite set of symbols with arities. Let A be a countable set of variables] 
T{T,X) denotes the set of finite terms on T and X. Terms can be seen as 
trees: root position is then denoted by A, symbol at root position in a term t 
by A{t), t\p denotes the subterm of t at position p. A substitution is a mapping 
cr from variables to terms s.t. {x G A | cr(x) yf x} is finite. We use postfix 
notation for substitution applications. A substitution can easily be extended to 
endomorphisms of T{T,X): f{t\, . . . ,tn)cr = /(tier, . . . ,tn<j). ta is then called 
an instance of t. 
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A rewrite relation is a binary relation — >■ on terms which is monotonic and 
closed under substitution; — >■* will denote its reflexive-transitive closure. A term 
rewriting system (TRS for short) over a signature^ T and a set of variables X 
is a set RlT) of rewrite rules I ^ r. A TRS R defines a rewrite relation — the 
following way: s t if there is a position p s.t. s\p = la and t = s[ra]p for a 
rule I ^ r G R and a substitution a. We then say that s|p is a redex and that s 
reduces to t at position p with I ^ r or, if the rule of R is not relevant, with R, 
respectively denoted s > t and s^t. 

We restrict ourselves to the study of finitely branching TRS, i.e. TRS s.t. 
the set of rules that can be applied to a term is always finite. A term is strongly 
normalizable (SN) if it cannot reduce indefinitely. A rewrite relation is strongly 
normalizing or terminates if any term is SN. Termination is usually proven with 
the help of reduction orderings [5] or quasi-orderings with dependency pairs. We 
briefly recall what we need. A term ordering, also known as ordering pair [10] 
is a pair (^,:^) of relations over T{T,X) such that: ^ is a quasi-ordering, i.e. 
reflexive and transitive; ^ is a strict ordering, i.e. irreflexive and tran- 
sitive and s.t. >■ ■ and ^ A term ordering is said to be 

well-founded if there is no infinite strictly decreasing sequence ti t2 ; 

stable if both and ^ are stable under substitutions; weakly (resp. strictly) 
monotonic if for all terms t\ and t2, for all f G if, if ti ^ (resp. )^ ) t2 then 
/(. . . , ti, . . . ) ^ (resp. ) /(. . . ,t2,- ■ ■)', A term ordering (^, :^) is called a 
weak (resp. strict) reduction ordering if it is well-founded, stable and weakly 
(resp. strictly) monotonic. 



3 Rewrite Modules 

We define hierarchical extensions of TRS and introduce the notion of rewrite 
modules, thus providing a rather structural view of complex TRS that leads to 
an incremental approach. 



3.1 Extensions & Modules 

From an operational point of view, a module consists of new symbols together 
with rules that define them. 

Definition 1. A module extending a system Ri{iFi) is a couple [iF2 | R2] s.t.: 

1. RinX2 = 9; 

2 . i?2 is a TRS over T\ U T2', 

3 . For each I ^ r G R2, A{ 1 ) G T2. 

System Ri U i?2 over T\ U T2 is then a hierarchical extension of system 
Ri{iFi) by module \T2 \ R2]; we denote RfifFi) G- [T2 \ R2] that extension. 

^ We shall omit the signature if there is no ambiguity. 
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The usual notions of unions can he expressed in terms of modules in a 
straightforward way. We will say that [Ti \ i?i] extends I ^o] regardless 
of [-^2 I R" 2 \ = 0; 0 I R-o] [-^1 I -Rl] Arid \J- Q I i?o] •<— \J -2 I d? 2 ] / such 

extension is indeed a union of composable TRS [12,9,15]. We will talk about the 
disjoint union Ri Ui ?2 if [Ri \ d?i] and [T 2 \ R 2 ] extend [0 | 0]- The union will be 
constructor sharing if \T\ \ i?i] and \T 2 \ -R 2 ] extend [To \ 0] • 

A property P is said modular for a specific kind of union if R\ and R 2 having 
property P implies that the relevant union of R\ and R 2 verifies P. 

3.2 Modular Splitting up of TRS 

One may in practice provide a TRS as a hierarchy of modules and use it as it 
is. Every system R{T) can anyway be automatically split up as an extension 
by minimal modules: the ones that cannot be seen as extensions of non-empty 
systems (see Example 1 below). This is done in two steps: 

1. Build a graph Q over symbols of T s.t. there is an arc from a: to y if and 
only if there is a rule I ^ r G R such that x = A{1) and y occurs in I or in r. 

2. Pack together symbols occuring in strongly connected parts of Q, that is 

symbols / and g such that / — >■* g and g — >•* / . 

Q Q 

Now, let us consider ‘packs’ of Q as signatures of our modules, we just have to 
add rules the left-hand side root symbols of which are in those and we are done. 
Note that there is no cycle in the resulting hierarchy since ‘mutually recursively’ 
defined symbols were packed together. 

Remark 1. For the sake of a better readability we may afterward gather con- 
structors symbols that can be reached from the same packs. 

Example 1. Let us consider addition and multiplication a la Peano. 

T = {s, 0, -b, x} 

( X -b 0 — y X, X X 0 — y 0, 

■ I X -b s{y) — >■ s(x -by), x x s{y) — >■ x -b (x x y). 

R+ Rx 

A natural (for programmers) and modular way of seeing it is to consider addition 
built upon constructors, and multiplication upon addition and constructors. De- 
scribing this TRS in module framework consists in introducing -b with addition 
rules and then x with multiplication rules. The extension scheme is then: 

[{s,O}|0]^[{+}|i?+]^[{x}|i?x] 

Note that those modules are minimal (in respect of Remark 1) . 

Further note that every system R{T) can be seen that way as an extension 
of the empty system over constructors of T by the module consisting of defined 
symbols^ (as signature part) together with all rules of R (as rules part). 

^ Constructors and defined symbols are here Arts & Giesl’s notions. 
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4 Termination and Unions of TRS 

Unfortunately, termination is not a modular property of TRS, even for disjoint 
unions, as shown by Toyama’s famous counter-example [16]. 

Example 2 (Toyama). These two TRS (over disjoint signatures) are terminating: 

Their union nevertheless allows infinite reductions, for instance: 
/(g(0, l),g(0, l),g(0, 1)) ^ /(0,5(0, 1),5(0, 1)) ^ /(0, 1,5(0, 1)) 

rt2 ^2 

^/(5(0,1), 5(0,1), 5(0,1))^... 

Ki K2 

One may observe that i?2 is not confluent, but similar counter-examples exist 
for confluent TRS [16]. However, problems come indeed with i?2 and its ‘projec- 
tive’ behaviour as studied by Gramlich [8]. Propositions to contain projections 
lead to the definition of Cs -termination^ [15,8]. 

Definition 2. Let G he a new (regarding all involved signatures) symbol; we 
denote tt the projective TRS {G{x,y) — >■ x,G{x,y) — >■ 5}. 

A system R is said Cf:-terminating if RUtt is terminating. 

That notion gives precious results. In particular: C^-termination is a modular 
property for disjoint union of TRS [15,8], as well as for union of composable TRS 
as shown by Kurihara & Ohushi [9]. 

Note that before the introduction of dependency pairs termination proofs 
were usually performed using simplification orderings (reduction ordering with 
subterm property), thus restricting automated proofs to simplifying TRS. 

Definition 3. A TRS R over T is said simply terminating ( or simplifying^ if 
R U {/(xi, ... ,Xi, . . . , Xn) Xi\l < i < n and f € T} is terminating. 

Proposition 1 (Gramlich [8]). A simplifying TRS Cs -terminates. 

Thanks to Proposition 1: lying between termination and simple termination, 
Cf:-termination is far from being too restrictive a property in practice. The ex- 
ample of Sec. 7 and, for instance, all examples by Arts & Giesl [1,2,3] (even 
those that do not simply terminate) Cf:-terminate. Actually, all known auto- 
mated techniques prove indeed Cfr-termination. 

In order to specify a class of orderings making the most of Cf-termination, 
we define tt extendable orderings. 

Definition 4. A term ordering (^,)^) on T{T,X) is said to be tt extendable if 
there is a reduction ordering overT{T\j{G},X) s.t.: 1) The restriction 

of{y',>-') on T{T,X) is exactly (^,)^) and 2) G{s,t) s and G{s,f) t for 
all s and t in T{E U {G}, X) . 

A TT extendable ordering is a strong tt extendable ordering if both (^, )^) and 
a suitable (^', )^') are strictly monotonic. Otherwise it is weak tt extendable. 

® We use here the terminology introduced by Olhebusch [14]. 
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Consequently, if (^,)^) is a strictly monotonic tt extendable ordering and if 
I >- r for all rules / — >■ r of i?, then R Cf:-terminates. 

5 Relative Dependency Pairs 

Now that we settled our modular approach, we would like to use it for proving 
termination. Most methods dealing with union of TRS use a notion of rank, 
that is rely on how signatures are interlaced within a term [8]. We define here 
relative dependency pairs, dependency pairs of modules with two aims in mind: 
Firstly to ‘capture’ signature swaps by selecting relevant subterms, secondly to 
‘forget’ other defined symbols since the whole termination proof is supposed to 
be performed incrementally. 

Definition 5. Let Ri{iFi) and \Ti \ R 2 ] he such that R\{Ti) ^ [T 2 \ .^ 2 ]- 

For each rule I ^ r G R 2 , a pair {l,r') where r' is a subterm of r such that 
A{r') G T 2 is called a dependency pair of module [^2 | R 2 ] ■ 

The set of dependency pairs of all rules of a module M is denoted DP{M). 

Example 3 (Example 1 revisited). Dependency pairs of \ Rx] consist in: 
DP([.7^x I -Rx]) : { (x X s{y),x xy)} 

Notice that {x x s{y), {x x y) + x) is not a dependency pair, unlike in Arts 

6 Giesl non-modular approach^. 

Remark 2. For a TRS seen as an extension of constructors by rules and defined 
symbols, relative dependency pairs and Arts & Giesl ones coincide. 

Definition 6. A dependency chain of a module [T \ R] over R' (with R C R' ) 
is a sequence of pairs . . . (sj,tj ) ... of DP{[iF \ Rj) together with a substitution 
(7 such that for any two successive pairs (si,ti), (si+i,ti+i) : 

. 

HCr > Sj+lCT. 

Ft' 

We shall consider minimal dependency chains, in the sense that each proper 
subterm of any left-hand side u-instantiated is strongly normalizable by R' . 

Proposition 2. Let R be a TRS over T , then R is Cg-strongly normalizing if 
and only if there is no infinite (dependency) chain of [T \ R] over i?U tt. 

Proof. Straigthforward from Remark 2 and since DP(i? U tt) = DP(i?). 

Corollary 1. Let (^,)^) be a (weak) tt extendable ordering. 

If I ^ T' for all I ^ r G R and s y t for all {s,t) G DP{R), then R is 
Cs -strongly normalizing. 

We do not consider here the restricted case of innermost termination [2] where usable 
rules modify the set of dependency pairs. 
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6 Incremental Proofs 

The modules framework is particularly efficient when it comes to modular and 
incremental proofs. We give our contributions for extensions of a system by, 
firstly one module, and secondly two disjoint modules. Thus, those apply to any 
hierarchical scheme. We obtain also effective reformulations of previous results. 

6.1 One System and One Module 

The basic case, typically a hierarchical extension of a system. Relative depen- 
dency pairs give us lighter conditions over systems to ensure termination. 

Theorem 1. Let R\{!Fi) and [T2 \ R2] he such that [T\ \ i?i] ^ [T2 \ R2\- 

1. If Ri is Cs -strongly normalizing and 

2. If there is no infinite dependency chain of [^2 I R2] over R\ U i?2; 

Then R\ U R2 is strongly normalizing. 

By contradiction: We will assume that there is an infinite dependency chain of 
then we will conclude either on an infinite dependency chain of [T2 \ R2] 
over i?i U i?2, or on non Cf:-termination of R\ which contradicts premises. 

Let us suppose R\ U R2 non terminating. Thus, there is an infinite 
dependency chain of U .7^2 I U R2] over i?i U i?2- Among pairs of 

DP([.7^i \JT2\R1 U R2]) are: Pairs of [Ti \ Ri]; Pairs of \T2 \ 7?2]; Eventually 

pairs (1, r') s.t. I ^ r G R2 and r' is a subterm of r with A{r') in Ti. 

On the first two cases we have knowledge. In order to get rid of the third kind, 
we need the following lemma. 

Lemma 1. Let and \T2 \ R2] he such that [T\ \ i?i] ^ [T2 \ R2\- 

Then for each (ui,vi) G DP{Ri) and for each pair {u2, V2) of \T2 \ R2] there 

=jiA * 

is no substitution a such that: Via > U2U. 

Proof. Actually A{u2) yf A{v\) since A(u2cr) = A{u2) G ip2 and A{v\a) = 
A{vi) gTx- _ 

According to Lemma 1, pairs of second and of third kinds can only follow pairs 
of second kind, pairs of first kind can only follow pairs of first and of third kind. 
Thus, we face three cases. The dependency chain either contains: Only pairs of 
second kind, that is pairs of \T2 \ R2] or only pairs of first kind, that is pairs of 
[Ti I i?i] or a finite number of pairs (maybe none) of second kind then one pair 
of third kind and then an infinity of pairs of first kind. 

• First case: Infinite dependency chain of [T2 \ R2] over Ri U R2 contradicts 
second premise of Theorem I. 

• Cases 2 & 3: In both cases we eventually get an infinite dependency chain 
of \Ti I i?i] over R\ U i?2- We are now going to show how that chain can 
be transformed into a chain of [Ti \ i?i] over Ri U tt. Doing so, we will 
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end up with an infinite chain of [Ti \ i?i] over R\ U tt, that is an infinite 
chain of [T\ U {G} | U tt] contradicting first premise stating that R\ is 
Cf:-terminating. 

We actually use (here and also in proof of Theorem 2) a more general result 
(Lemma 2) whose proof provides a way to build such a suitable chain. 

Lemma 2. Let Si and S 2 two TRS over T\. Let S^{Ti UIF 2 ) he such that: 
— Rid R 2 = 0 ; 

— For each I ^ r £ S 3 , A{1) £ T 2 - 

Then from any infinite minimal chain of [Ti \ 52 ] over S'! U S '2 U 5'a, it is 
possible to build an infinite chain of [Ti \ S' 2 ] over S'! U 52 U tt with the same 
sequence of pairs but new instantiation and rewriting steps. 

The remaining part of Theorem 1 proof consists in applying Lemma 2 with 
R\ = Si = S 2 and i ?2 = S 3 . Hence for any infinite dependency chains of 
[Ti I Ri] over Ri U i ?2 we can build a corresponding chain of [Ti \ i?i] over 
Ri U 7T, that is a dependency chain of Ri U tt. Since Ri is supposed to be 
Cf:-terminating, we raise a contradiction. Q.e.d. 

Now we have got to prove Lemma 2. The proof is rather technical and use 
some interpretation of terms akin to Gramlich’s [8]. We denote Too (IF, X) the 
set of infinite terms over signature T and a set X of variables. 

Definition 7. Let S 6e 5i U ^2 U 5s. Let > be an arbitrary yet total ordering 
over T{fFi U IF 2 , X). 

Interpretation Is{x) : T{Ti U T 2 ,X) -£ Tao{IFi U {G : 2} U {T : 0},X) can 
be defined as: 



T (f(f f f{Is{ti)...Is{tr,)) if f£Ti, 

\comh{Red{f{ti...t,,))) if f £ ^ 2 - 
where Red{t) = {Is{t')/ t >t'}, 



5'iU5'2US3 

Comh{tlf) = T, 

Comh{{a} U set ) = G(a, Comb{set)) where for all x £ set, a < x). 



Comb(if) is built from an unordered set E, ordering > is then needed to remove 
any ambiguity. 

Remark 3. Please note that interpreted terms appear in a ‘comb-like’ shape. 
Each ‘tooth’ being itself the interpretation of a one-step-reduced term, any of 
these can be reached by an appropriate — >* — > reduction. 

7T2 7Ti 

We give yet another few technical lemmas showing 1$ good behaviour. 
Lemma 3. For each t € T{iFi,X) and each substitution a, I{ta) = tl{u). 
Proof. Structural induction on t. 
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Lemma 4. For all . . . ,t„ in T{Ti U X) and for any context C over T\ 
with n holes, Is{C[ti,. . . ,t„]) = C[Is{ti), . . . 

Proof. Structural induction on C. 

Lemma 5. For any term t strongly normalizable by S'! U S'2 U S'3, I{t) is finite. 
Proof. Trivial since we are interested in finitely branching TRS only. 

Lemma 6. For any s and t in T{T\ U T 2 , X), and I ^ r G SiU S 2 
if s — ^ t then I(s) I (t) . 

l^r SiUSzUtt 

Moreover, if A{s) G then I{s) I{f). 

SiUS 2 U' 7 r 

Proof. Two cases depending on symbols occurring along path from A to p. 

• If there are only symbols of iFi then s = ... ,la, . . . , s„], s|p = la and 

C is a context with n holes over iFi. 

/(s) = I{C[si,... ,la,... ,s„]) 

= C[I{si),... ,I{la),... ,/(s„)] (Lemma 4) 

= (^[/(si),... ,ll{a),... ,/(s„)] (Lemma 3) 

—^C[I{si),... ,rl{a),... ,/(s„)] (hypothesis) 

0UO2 

= C[I{si), . . . ,I{ra), . . . ,/(s„)] (Lemma 3) 

= I{C[si,. . . ,ra,. . . ,Sn]) = I{t). 

• If symbols of T 2 occur, then there is a smallest p' < p (w.r.t. prefix ordering) 

s.t. A(s|p/) G T 2 . We may again assume w.l.o.g. that s = C[si,... ,s' ,... , s„] 

where C is a context with n holes (possibly empty) over T\,p = p'q and sL' = s' 

with s' = C'[la] >C'[ra] = t' . Then 

SUS2 

/(s) = /(C[si, ... ,s' ,... ,s„]) = ^[/(si), . . . ,/(s'), . . . ,/(s„)] (Lemma 4). 
From Def. 7, /(s') = Comb(Red(s')). But s'|g = la >ra, thus, by definition 

5'iU5'2 

of Red, ra G Red(?cr). Hence, I{t') is a subterm of /(s') and /(s') — >■*■ /(T). We 

7T 

get 

C[/(si), . . . , /(s'), . . . , /(s„)] ->+ C[/(si), . . . lit'), ..., /(s„)] 

7T 

= /(C[si,... ,t',... ,s„]) (Lemma 4) = lit). 
Lemma 7. For all s and t in TiT\ U 7^2, X), if s t then /(s) lit). 

S3 TT 

Moreover, i/H(s) G Ti then /(s) lit). 

7T 

Proof. Similar to the proof of Lemma 6, case 2. 
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Proof of Lemma 2 Let (ui,vi), {u2, U2), ■ • ■ be a dependency chain of [T\ \ 52] 
over S'! U S'2 U S3 with a substitution a. Let a' be the substitution such that for 
all X, xa' = I{xu). 

Since the considered chain is minimal, cr is strongly normalizable so, from 
Lemma 5, a' is indeed a substitution to finite terms. 

We show that {ui,v\), (u2,V2), ■ ■ • with a' is a chain of \ S'2] over U 

#4 * 

S2 U 7T. For that purpose we have to show that for each i, Via' Ui+ia'. 

SiUS 2 U 7 r 

* p 

We know that v^a > Ui+iu. Let us consider any step s >t 

S1US2US3 S1US2US3 

from that reduction. Since we have yl(s) = A{t) = A{vi) = A{ui+i) G 
either from Lemma 7 or Lemma 6 we get that I{s) — > Thus, putting 

SiUS 2 U' 7 r 

/yl * 

steps together we obtain liviu) I(ui+ia). Since liviu) = Via' and 

SiUS2U7T 

I{ui+\a) = Ui+ia' from Lemma 3 we are done. 

The following corollary enables us to compose our termination results, allow- 
ing this way incremental proofs. 

Corollary 2 . Let Ri{Ti) and \T2 \ R2] be such that [T\ \ i?i] ^ [T2 \ 7?2]- 

If R\ is Cc -terminating and if there is no infinite dependency chain of 
\T2 I R2] over i?i U i?2 U tt, then R\ U i?2 is Cs -terminating. 

Proof. We apply Theorem 1 with \T2 U {G} | i?2 U tt] extending system R\. 

Those results can be turned into an effective method by Corollary 3. 

Corollary 3 . Let R\(T\) and \T2 \ R2] be such that [T\ \ i?i] ^ [T2 \ 7?2]- 
If R\ is Cs -terminating and if there is a weak reduction ordering (resp. tt 
extendable ordering) (^,)^) such that: i?i U i?2 and DP{[T2 \ 7?2]) then 
i?i U i?2 is terminating (resp. Cg -terminating). 

Proof. All chains of \T2 \ R2] over i?iUi?2 are actually finite. Since steps between 
pairs decrease w.r.t. ^ and DP steps decrease w.r.t. an infinite reduction 
would contradict the premise stating that (^, )^) is well-founded. 



6.2 One System and Two Modules 

The other basic case, allowing us to apply our results to any hierarchical scheme. 

Theorem 2 . Let R\{Ti), [T2 \ R2] and [T3 \ i?3] be such that: 

[Ti I i?i] ^ [T2 I R2] and [Ti \ Ri] ^ [.7^3 | i?3] with .7^2 H J^3 = 0. 

1. //i?i U i?2 is Cs -strongly normalizing and 

2 . If there is no infinite dependency chain of [T3 \ i?3] over U i?3 U tt, 

Then R\\J R2 A R3 is Cs -strongly normalizing. 
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Remark 4 - Note that i?2 does not interfere with the premise over \ i^a]. 

Further note that the Cfr-premise cannot be omitted: Toyama’s counter- 
example (see Example 2) would then be a special case of the relevant extension. 

As a corollary and using Prop. 2 we obtain a previous result by Kurihara & 
Ohuchi [9]: C^-termination is modular for composable unions of TRS. 

Theorem 2 applies to automated proof when used as the following corollary: 

Corollary 4 . Let Ri{Ti), [T2 \ R2] and [.7^3 | R3] be such that: 

I I Ri] -i— [.7^2 I -R 2 ] and \J~i \ 7 ?i] •<— [. 7^3 | R3] with J -2 D = 0. 

If R\ U i ?2 is Cs -terminating and if there is a tt extendable weak reduction 
ordering (^, )^) s.t. R\ U Rz and DP{[iFz \ R3]) then 7 ?i U i?2 U R 3 is 
Cs -terminating. 



7 A Complete Example 

Consider system R^ over signature = {ff '■ constant] 0,1: postfix unary} de- 
scribing integers in binary notation: = {#0 — >■ ff. We want some arithmetic 

over them and define addition with [.Tvj. | 7?+]: 

1F_|_ {-I- : binary}, 

{ x-\-ff^x, xO -I- j/0 — >■ (cc -I- y)0, xO -I- — >■ (x -I- y)l, 

# -I- X — >■ X, xl -I- 2/0 — >■ (x -I- y)l, xl -I- 2/1 — f ((x -I- 2 /) + #1)0, 

X -\- {y -\- z) ^ {x -\- y) -\- z. 

i?+ has critical pairs: its innermost termination does not implies termination 
and we cannot apply modularity criteria from Arts & Giesl [2] . We use relative 
dependency pairs and a polynomial interpretation. 

{ (x 0 -\-y 0 ,x-\-y), {xO -\- yl,x -\- y), {xl -\- yO, x -\- y) , 

(xl -\-yl,x-\-y), (xl -f 2/1, {x -k y) -\- #1)), 

{x -\- {y -\- z) , X -\- y) , (x -f ( 2 / -f z), {x -\- y) -\- z) . 

I#1 = 0, I0l(a:) = H -f 1, |ll(x) = |x] -f 3, |-fl(x, 2 /) = [x] -f 2|2/] -f 1. Since 

DP([.T:'+ I 7?+]) strictly decreases and R^ U 7?+ U tt weakly decreases, by Cor. 3, 

R^ U 7?+ Cf-terminates. We might then add subtraction: 

T- {— : binary}, 

Jx-#-)>x, xl - 2/1 -f (x - 2/)0, xO - 2/0 (x - 2/)0, 

xl - 2/0 -)> (x - 2 /)l, xO - 2 /I -f ((x - 2 /) - #1)1. 

Relative dependency pairs and polynomial interpretation again suffice to prove 
that R^ U R- Cf-terminates. We use: |#] = 0, |0](x) = |x] -I- 1, |l](x) = 
|x] -I- 1, |— ](x, 2 /) = |x]. Dependency pairs of [T- \ 7?_] strictly decrease while 
R^ U R- weakly decreases w.r.t. to that interpretation. Applying Cor. 4 we 
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conclude that U R- U R+ Cf:-terminates. In order to use comparison we need 
Booleans and provide [IFbooI | ^Booi]- 

^Boo\ {true, false : constants] : unary, A : infix binary, if : ternary}, 
(-•(true) -i- false, -•(false) true, x A true -Ax, 

I a; A false -A false, if (true, x, y) -A x, if (false, x, y) -A y. 

That system has no dependency pair, hence Cf:-terminates. 

We now define comparisons in [Tge \ Rge] extending and -RbooI- 



Rge [ge : binary), 

{ ge(x0,j/0) ge(x,y), ge(#,a:l) -)> false, ge(xl,yO) -A ge(x,y), 
ge(xl,yl) ^ ge(x,y), ge(x,ff) -A true, ge(#, xO) -)> ge(#, x), 
ge(xO,yl) -A ^ge(y,x). 



Termination of Ry U i?Booi U Rge is proven using RPO with precedence {ge > 
-■ > (true , false)} . RPO is a simplification ordering thus tt extendable: that 
union Cf:-terminates by Prop. 1. We may then apply Thm 2 to conclude that 
Ry U i?Booi 0 Rge U U i?_ Cf:-terminates. 

Let us provide logarithm Log over intergers. It is technically easier firstly to 
define Log'(x) = Log(x) + 1 with convention Log'(O) = 0. 



■Rhog' {Log' : unary}, RLog 




Log'(ff) -A #, Log'(xl) ^ Log'(x) + #1, 
Log'(xO) -A if(ge(x,#l),Log'(x) + #1,#). 



We use again relative dependency pairs and a polynomial interpretation. 
L>R([RLog' \ Rhog’]) ■■ {{Log'(xl),Log'(x)), (Log'(xO),Log'(x)) }. For |#] = 
0, |0l(x) = |xl + 1, |l](x) = |x] + 1, |+l(x, 2 /) = |x] + |j/] |false] = 
0, [true] = 0, H(x) = 0, [ge](x) = 0, {ifl(x,y,z) = |?/] + |z], |A](x,y) = 
|x], |Log'](x) = |x], relative dependency pairs strictly decrease while rules of 
i?,^Ui?+Ui?BooiUi?geUi?Log^/ weakly decrease. By Cor. 4 we prove Cf-termination 
of Ry\jR+\J RbooI U Rge U R^og' u i?_ . 

We may now compute logarithm: 



Rhog {Log : unary}, Rrog {Log(x) -A Log'(x) - #1. 

Since [Rrog I -^Log] has no dependency pair, applying Thm 2 we conclude on 
Cf-termination of Ry U R+ U i?_ U RbooI U Rge U i?Log' U Rhog- 



8 Related Work 

Our work has to compare firstly with Dershowitz’s results [6]. Since we do not 
restrict ourselves to constructor based systems, and use a slightly more general 
definition of hierarchical extensions, we thus obtain more general conditions. 
Moreover our criteria (fully syntactical and applicable to most TRS met in prac- 
tice) seem more suited for automation — because they were designed for it — than 
the finely tuned conditions of Dershowitz. Secondly, Arts & Giesl exploit the 
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modular structure of dependency graphs [2]. Still, their criterion puts conditions 
over the whole system (whatever the extension might be). That is a drawback 
we wanted to get rid of, because it fundamentally acts as a break upon real in- 
cremental proving. As noticed in Rem. 4, criteria based on Thm 2 do not require 
anything from irrelevant sets of rules. Our framework furthermore provides for 
the general case the results they got for the special case of innermost rewriting. 

9 Conclusion &; Future Work 

We proposed the notion of rewrite modules in order to express structural and 
hierarchical information about TRS. Different kinds of extensions can be seen as 
special cases of modules hierarchy schemes. Modules are a fertile ground for auto- 
mated incremental termination proofs: we introduced relative dependency pairs 
and criteria that allow proofs to be split up an run in an incremental fashion. We 
eventually obtain as a corollary a former result from Kurihara & Ohuchi [9] that 
is directly and easily expressed in that framework. Relative dependency pairs 
approach with the help of tt extendable orderings provide criteria particularly 
suited for automation. The resulting tests are finite and purely syntactical, thus 
implement able. Moreover that method weakens constraints over orderings to be 
found in two ways. Firstly, the (really) incremental scheme itself filters only rel- 
evant rules; secondly, the relative dependency pairs widen the class of suitable 
orderings over these relevant rules. 

Marked pairs, dependency graphs and thus graphs refinements can be defined 
in the module framework; that extension is straightforward. Another extension, 
quite important in practice, is the application of modules to TRS with associative 
and commutative symbols. This could lead to relative pairs somewhat similar to 
AC-extended pairs [11]. 

Finally, modules and relative dependency pairs are now full parts of the 
CzME2 [4] termination tool developped in our research team. 
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Abstract. We define a subclass of the class of linear equational theories, 
called finitely closable linear theories. We consider unihcation problems 
with no repeated variables. We show the decidability of this subclass, and 
give an algorithm in PSPACE. If all function symbols are monadic, then 
the running time is in NP, and quadratic for unitary monadic finitely 
closable linear theories. 



1 Introduction 

The problem of if-unification[l] is an important problem for automated deduc- 
tion, as well as other areas of computer science, such as formal verification and 
type inference. Given an equational theory E, an if-unifier of terms s and t is a 
substitution 9 such that s9 and t9 are equivalent modulo E. In many applica- 
tions it is necessary to find a complete set of E-unifiers of terms s and t, that 
is, to find a set of if-unifiers of s and t from which all other if-unifiers can be 
generated. 

Unfortunately, if-unification is undecidable in general. In addition, for some 
equational theories there is no finite complete set of unifiers. Therefore, if it 
necessary to determine which classes of equational theories have a decidable 
algorithm and on which if-unification problems. Furthermore, the complexity of 
those algorithms should be analyzed. 

There has been much work in finding particular equational theories with 
decidable if-unification problems and analyzing their complexity. There has been 
less work in identifying classes of equational theories with decidable if-unification 
problems. However, there has been some recent work in that area, but not all of 
it analyzes complexity. See [9] for some references. 

Recently, we have developed a simple new method of if-unification and 
proved its soundness and completeness [6] for all equational theories. In [7] we 
have refined it for linear theories. The method is a generalization of the Gen- 
eral Mutation inference rules for Syntactic Theories [2, 3, 4, 5]. It is an inference 
procedure that does not always halt. However, the goal of developing this new 
method was to use it to find decidable classes of equational theories and analyze 
their complexity, which is what we do in this paper. 

** This work was supported by NSF grant number CCR-9712388. 
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We consider linear theories, i.e., theories where in each equation no terms 
have repeated variables, although terms on opposite sides of an equation may 
share variables. This class of equational theories includes all theories with 
monadic functions symbols. We only consider if-unification problems whose set 
of goal equations contains no repeated variables. This is a restricted if-unification 
problem, but it contains the word problem, which is undecidable for equations 
on strings, and it also includes some existential problems. 

The particular class we prove decidability of is what we call finitely closable 
theories. To use our algorithm, we must assume we know a finite set of terms, 
such that we can find a complete set of unifiers for each pair of those terms. If the 
terms that appear in each complete set of unifiers are already in the set, then we 
call the set finitely closed. When such a set exists, we have an algorithm to solve 
the if-unification problems mentioned in the previous paragraph. We show the 
algorithm is in PSPACE. However, for the case of monadic function symbols it is 
in NP, and furthermore it is quadratic if each complete set of unifiers mentioned 
above is unitary. 

Of course, we have not mentioned, so far, how to find this finite set. We also 
show some ways in which such a finite set can be found. 

The format of the paper is to give some preliminary definitions, then to 
present the algorithm which gives our decidability results and prove the com- 
plexity results. Finally we give a method for finding the finite set in some cases. 

2 Preliminaries 

We assume we are given a set of variables and a set of uninterpreted function 
symbols of various arities. An arity is a non-negative integer. Terms are defined 
recursively in the following way: each variable is a term, and if ti,---,tn are 
terms, and / is of arity n > 0, then /(ti, • • • ,t„) is a term, and / is the symbol 
at the root of f{ti,--- ,tn)- A term (or any object) without variables is called 
ground. If t is any object, then V ar{t) is the set of all variables in t. 

We consider equations of the form s « t, where s and t are terms. Let E 
be a set of equations, and u « w be an equation, then we write E \= u fv v (or 
u =E v) if u ^ V is true in any model of E. If G is a set of equations, then 
E \= G means that E \= e for all e in G. If all the function symbols in E are of 
arity no greater than one, then E is monadic. 

A substitution is a mapping from the set of variables to the set of terms, 
such that it is almost everywhere the identity. We identify a substitution with 
its homomorphic extension. If 0 is a substitution then Dom{9) = {x \ x9 x}. 
The range of 9, Ran{9) is {x9 \ x € Dom{9)}. A substitution a is idempotent 
if (7(7 = (7. In this paper, all substitutions will be considered to be idempotent. 
A substitution 9 is an E -unifier of an equation u ^ v if E \= u9 ^ v9. 9 is 
an E-unifier of a set of equations G if 0 is an A-unifier of all equations in G. 
Whenever an equation or a set of equations has an if-unifier, it also has an 
idempotent if-unifier. If 9 is an A-unifier of u « u, we say that 9 is linear if no 
variable appears more than twice in Ran{9), and if a variable z appears twice 
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in Ran{9) then there is an a; in u and & y in v such that 0 appears in x9 and z 
appears in y9. This implies that there are not two different variables x and w in 
u such that z appears in x9 and w9. 

If cr and 9 are substitutions, then we write a <e 9\y ar{G)] if there is a 
substitution p such that E ^ xap « x9 for all x appearing in G. If G is a set 
of equations, then a substitution 6* is a most general unifier of G, written 9 = 
mguiG) if 0 is an G unifier of G, and for all E unifiers a oi G, 9 <e a\Var{G)]. 
A complete set of A-unifiers of G, is a set of A-unifiers 0 of G such that for all 
A-unifiers a of G, there is a 0 in 0 such that 9 <e cr\V ar{G)\. 

Given a unification problem we can either solve the unification problem or 
decide the unification problem. Given a goal G and a set of equations E, to 
solve the unification problem means to find a complete set of A-unifiers of G. 
To decide the unification problem simply means to answer true or false as to 
whether G has an G-unifier. 

We say that a term t (or an equation or a set of equations) has varity n 
if each variable in t appears at most n times. An equation s « t is linear if s 
and t are both of varity 1. Note that the equation s « t is then of varity 2, 
but it might not be of varity 1. A set of equations is linear if each equation 
in the set is linear. For example, the axioms of group theory ({/(a;, /(y, z)) « 
y),z),f{w, e) « w, f{u, i{u)) « e. are of varity 2. 

3 Algorithm 

We will be considering linear equational theories E. The goals G we are trying to 
solve are sets of equations with no repeated variables (varity 1). In this section 
we will give an G-unification algorithm, and in the next section we will prove 
the algorithm halts for A-closed sets T, defined below, and give the complexity 
of the algorithm. 

Definition 1. A set of terms T is called E-closed if it satisfies the following 
conditions: 

1. every term in T is of varity 1; 

2. no member ofT is a variable; 

3. if f is a symbol of arity n > 0 appearing in E, then f{xi • • • , x„) € T; 

4- T contains two new constants c and d, which are not symbols of E. 

5. if s and t are renamings of terms in T, and 9 € GSUE{s,f), then 9 is 
linear, and for all x in Var(s « t), whenever Xi9 is not a variable, there is 
a renaming p such that Xi9p G T; 

6. if t' is a nonvariable subterm of t, then there is a renaming p such that 
t'p G T. 

In the definition of T we assume that we are able to calculate a complete set 
of A-unifiers for all pairs of terms in T. Each such T could have an associated 
table listing the complete set of unifiers for each pair of terms in T. If such a T 
exists, we will show that the A-unification problem for all goals G of varity 1 is 
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solvable. But first we will show that if G contains symbols that are not in T, then 
T and its associated table of complete sets of unifiers can easily be extended to 
handle such goals. First T is extended so that whenever u[c\ is a member of T 
for some term m, then • • • , x„)] is added to T for every new symbol /, 

of arity n > 0, appearing in G. Then the table of complete sets of if-unifiers is 
extended as follows. 

Let f{xi, • • • , Xn) and g{y\, • • • , ym) be terms in the extended T, such that 
/ and g are different symbols, and at least one of / and g did not exist in 
E. If / is not a symbol in E, then let u = c, else let u = /(xi, • • • , x„). If 
g is not a symbol in E, then let v = d, else let v = g{yi, ■ ■ ■ ,ym)- Find the 
complete set of if-unifiers {cti, • • • , (7^} of u and v. Let { 6 i, - ■ ■ , 0 k} be the set of 
substitutions such that each 9 ^ is created from ai by replacing each occurrence of 
c in the range of (Xi by /(xi, • • • , x„), and replacing every occurrence of d in the 
range of ai by g{yi, - ■ ■ , ym)- Then { 9 i, ■ ■ ■ , 9 k} is a complete set of if-unifiers 
for f{xi, ■ ■ ■ , Xn) ~ g{yi, ■ ■ ■ , ym)- Furthermore, all terms in the range of each 9 i 
have already been added to T. 

Again, let / be a symbol in G that is not in E. Then, a complete set of 
A-UnifierS for f{xi,---,Xn) « f{yi,---,yn) is {[Xi Zi,---,Xn Zn,yi 
zi, - ■ ■ ,yn '— >■ Zn]}- All terms in the range of this substitution are variables. 

Now we have an extended T which is A-closed over the symbols of if U G, 
and we have an extended table of complete sets of if-unifiers. For the rest of this 
paper, we will assume we are working with this extended set. 

We give several examples of A-closed sets. 

Example 1 . Let E be the theory of associativity and commutativity, 
{f{f{x,y),z) « f{x,f{y,z)),f{x,y) « f{y,x)}. Let T = {f{x,y),c,d}. Then 
any pair of terms where one of them is c or d has no if-unifiers. So, to prove 
that T is A-closed we only need to check CSUE{f{xi,X2),f{yi,y2))- In fact, 
GSUE{f{xi,X2),f{yi,y2)) = {^i, ^2 , cts, ^4, cts, ag, cry}, where 

- CTl = [xi f{zi,Z2),X2 f{z 3 ,Z 4 ),yi f{zi,Z 3 ),y 2 f{Z 2 ,Z 4 )] 

- CT2 = [xi Z2,X2 /(^g, 24 ) , 2/1 23 , 2/2 f{z2,Z4)] 

- CT 3 = [Xi Zi,X 2 f{Zs,Z 4 ),yi /(2l,Z3).2/2 Z 4 ] 

- CT4 = [xi f{zi,Z2),X2 24,2/1 Zi,y2 /(22, 24)] 

- CTs = [Xi f{zi,Z2),X2 23,2/1 /(2i,23),2/2 Z2] 

- CTg = [xi 22, X2 23, l/l 23, 1/2 22] 

- CT7 = [xi Zi,X2 24,2/1 2 i,l /2 24] 

Notice that whenever a nonvariable term appears in the range of some ai, then 
a renaming of that term appears in T. Therefore, T is if-closed. 



Example 2 . Let E be the monadic theory {fgfx « gfgx}. Let T = 
{fx,gy,fgz,gfw,c,d}. Then again, any pair where one term is c or d is not 
unifiable. The complete set of unifiers of any term with a renaming of itself, 
such as fxi and fx2, has as most general if-unifier, [xi i— 2, 2/1 >— >■ 2]. There 
are twelve more pairs that must be checked. For example GSUEifx, gy) = 
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{[x gfz,y fgz]}. Also CSUE{fx,gfy) = {[x gfz,y gz]}. Also 
CSUE{fgx,gfy) = {[x fz,y gz]}. We leave it to the interested reader to 
check the others. Notice that any term that appears in the range of a unifier is 
a renaming of something in T. So T is A-closed. 



Example 3. Let E be the monadic theory {fggx « gffx}. Let T = 
{/x, gy, ffz, ggw, c, d}. Once again, any pair involving c or d is not A-unifiable. 
A pair of two renamings of the same term is as in the previous example. The 
pair of terms fx and gy has a most general A-unifier [x i— >■ ggz,y i— ffz]. No 
other pair of terms is if-unifiable. Therefore, to show that T is A-closed, we only 
need to verify that a renaming of ggz and ffz are in T. 



Example f. Let E = {fx « x}. Let T = {/x,c, d}. Then fx « fy has a most 
general A-unifier [x i— z,y i— z]. Also, c ^ fx has a most general A-unifier 
[x I— c]. The other complete sets of if-unifiers are easy. T is A-closed, because 
the only nonvariable terms which appear in the range of a unifier in a complete 
set of if-unifiers are c and d. Now, suppose we want to consider a goal containing 
a new monadic function symbol g. First, we add gy to T. Then we note that 
c ^ fx has a most general if-unifier [x i— >■ c]. Therefore, [x i— >■ gy] must be a 
most general A-unifier of gy « fx. So the extended set is also A-closed. 

We define a function called H to calculate the height of a term in terms of 
the set T. The height is defined so that a term from T is considered as if it was 
a single symbol. 

Definition 2. Let T be an E -closed set of terms. H{t) is defined recursively in 
terms ofT. 

1. H{x) = 0, if X is a variable; 

2. K{s,p) = 1 max{H{xp) ] x € Var(s)}, if p is a substitution; 

3. H(t) = min{K{s,p) ]t = sp and s € T} if there exists an s € T and a p 
such that sp = t; 

Note that item 3 applies to a term t if the root symbol of / is in E, since 
we have said that /(xi, • • • , x„) G T for all symbols / in E. If T is extended 
to include symbols in t as explained above, then item 3 always applies. If E 
is empty, then this definition gives the standard definition of the height of a 
term, which we denote SE[(t). The height of a term is the minimum number of 
applications of terms in T it takes to construct the term. If H{t) = n, we say 
that the T -height oft is n. If SE{(t) = to, we say the standard height oft is to. 

Example 5. For example, consider the set T to be {fx, gy, fgz, gfw, c, d}. Then 
the T-height H{x) = 0 and H{fx) = H{gx) = E[{fgx) = E[{gfx) = 1. The 
following set of terms are all of T-height 2: 

{ffx, ggx, ffgx, gfgx, fgfx, gfgx, fgfgx, gffgx, fggfx, gfgfx}. 
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Let h = max{SH{t) \ t G T}. We can see from the definition that H{t) < 
SH{t) and SH{t) < h x H{t). 

As for height, we define the standard size of a term and the T-size of a term. 

Definition 3. Let T be an E-closed set. The T-size of a term t, \t\, is defined 
recursively as: 

1. |a;| = 0, for any variable x; 

2. |s|p = 1 + A{|a;p| | x € Var(s)}, if p is a substitution; 

3. |t| = min{\s\p \ t = sp and s € T} if there exists an s € T and a p such that 
sp = t; 

If if = 0, then \t\ is the standard size of t. The T-size is related to the 
standard size in the same way as the T-height is related to the standard height. 

If an if-closed set T is finite and G has no repeated variables, then we will 
prove that we can solve the if-unification problem for G. For the rest of this 
section, we will assume that T is closed and finite. Since G has no repeated 
variables, each equation in G can be solved separately without affecting the 
other results, so for simplicity we will assume that G is a single equation. 

An equation x « f, where x is a variable, is called a solved equation. 

Our algorithm is based on the following inference rule: 

Suppose the goal is u « u. Let s and t be terms in T, and let p be a 
substitution such that sp = u and tp = v, and such that H{s,p) = H{u) 
and H(f,p) = H{v). ^ We don’t-know non-deterministically find a unifier 
a £ CSUe{s, t). If Var{s « t) = {xi, • • • , x„} then the rule is the following: 

Mutate 



u~v 

Ui<*<„a;*p« x*cr 



Here is an example. 

Example 6. Let E = {fgfx « gfgx} and let T be the if-closed set 
{/x, gy, fgz, gfx, c, d}. Suppose that the goal is fa « gb. Then CSUEifx, gy) = 
{a}, where ct = [x i— >■ gfz, y i— >■ fgz]. We also find a matcher p = [x a,y b] 
such that fa = fxp and gb = gyp. The Mutate inference rule applies: 

fa - gb 

a « gfz, fgz « b 

This is because of the fact that xp = a, xa = gfz, ya = gfz, and yp = b. 

It is obvious from this example that our inference rule is a generalization of the 
Mutate Rule from [7]. 

Consider a related example. 



^ This means that we use the same s, t and p as in the definition of T-height. 
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Example 1. Let E and T be as in the above example. Suppose that the goal is 
fga « gfb. Then CSUEifgx, gfy) = {a}, where a = [x ^ fz,y ^ gz]. We also 
find a matcher p = [x a,y b] such that fga = fgxp and gfb = gfyp- The 
Mutate inference rule applies: 

fga » gfb 
fz,gz ^ b 

This is because of the fact that xp = a, xa = fz, ya = gz, and yp = b. In this 
example, if we chose s = fx, t = gy, and p = [x ^ gci,y '— >■ fb], then it would 
have still been true that sp = fga and tp = gfb. However, this would not have 
minimized the T-height, so it is not valid. 

Mutate always applies to a goal ufnv, because of the definition of T, as long 
as T is extended to cover all the symbols that appear in u « u but do not appear 
in E, as explained above. 

We also have an inference rule: 

Clash 

u~vUG 

I 

if there is an s and t with sp = u,tp = v, and s and t are not if-unifiable. If 
the symbol T appears in a goal, then that goal will never yield an if-unifier. An 
example is: 

Example 8. Let E = {fggx « gf fx}. Let T be the if-closed set 
{fx,gy, ffz,ggw,c,d}. Suppose that the goal is ffa « ga. If s = ffz and 
t = gy. Then p = [z ^ a, y ^ a] is a, matcher. But ffz and gy are not unifiable. 
The Clash rule applies: 

ffa ft! ga 
T 

So ffa and ga are not A-unifiable. Interestingly, we could have chosen fx and 
gy from T. Those terms are A-unifiable. Therefore Mutate would have applied. 
If we kept applying the inference rules in that fashion, then we would not halt. 
That is why it is necessary to choose s and t to minimize the T-height, and why 
it is necessary that T is closed in order for this algorithm to halt. 

We now prove the soundness of our inference rule. 

Theorem 1. Let s, t, u and v be terms, and let p, a and 0 be substitutions such 
that sp = u, tp = V, and a G CSUE{s,t). Suppose that for all x € Var(s « t), 
xpO =E XU0. Then u9 =e v0, 

Proof. Since xp9 =e xa9 for all variables in s and t, then by the properties of 
substitutions: sp9 =e sa9 and tp9 =e tu9. Hence u9 = sp9 =e sa9 =e ta9 =e 
tp9 = v9. (Here the third equality holds because a G CSUE{s,t)). □ 

Now we prove the completeness of the rule. 
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Theorem 2. Suppose there exists 0 such that, uO =e v 6, and there is a matcher 
p, such that, sp = u and tp = v, for some s,t € T. Then there must be a 
substitution a G CSUE{s,t), such that xpO =e xaO for all variables in Var{s,t). 

Proof. Since u0 =e v9, and p is the matcher, sp9 = tp9. Hence there must be 
a cr G CSUE{s,f), such that, ctt =e p9, Then xp9 =e xut = xaar, since we 
assume every substitution is idempotent. Furthermore, xaar =e xap9 = xa9, 
because p does not apply to any variables in Ran{a). □ 

Our algorithm is defined in terms of the Mutate inference rule: 

UPS V 

Recall that u = sp and v = tp. Since s and t are from T, and we are assuming 
that u PS V has no repeated variables, we can divide the variables {xi, • • • ,a;„} 
into disjoint sets Y and Z such that Y contains all the variables in s, and Z 
contains all the variables in t. 

Then the algorithm we will describe in this section is as follows. Suppose we 
want to solve the if-unification problem for a single equation m « u. If u is a 
variable, then we return the substitution [u i— >■ w]. If u is a variable we return 
[u I— >■ w]. Otherwise, find an s and t as required in the inference rule. Then for 
every cr G CSUE{s,t) we will recursively solve za ps zp for all z & Z. Assume 
these recursive calls to solve za = zp all return an A-unifier. Then let 9' be 
the union of all the unifiers. Since u ps v will be assumed to have no repeated 
variable, and since each substitution in the complete set of unifiers of two terms 
in T will be linear, the union is well-defined.^ Then we apply 9' to each equation 
VjP ~ The result of the application of 9' will be yjp ps yja9', since 9' does 
not apply to any of the variables in the range of p. Let 9" be the union of all of 
these unifiers obtained from recursive calls on yjp ps yja9' . Then the unifier of 
■u « u is 9'9" . If any of the recursive calls returns _L, then solve will also return 
_L. See the algorithm in Figure 1. We must prove that the algorithm will halt. 
We will prove it halts by giving a bound on the number of recursive calls. In 
order to do so, we also give a bound on the T-heights of the terms in the ranges 
of the A-unifiers which are generated. 

We make the algorithm nondeterministic by using a choose function.^ This 
makes it easier to define. We must take this into account when we analyze the 
complexity. The function choose will select one A-unifier out of a set of A-unifiers. 
The end of the algorithm results in one A-unifier. Each possible choice in this 
algorithm would supply a complete set of E-unifiers. This set of E-unifiers may 
contain some occurrences of _L, since some choices may not give an E-unifier. 
Then just remove _L from the set. 

In Figure 2, we give an example of performing the algorithm on the goal 
fffui PS ggggu 2 , with the equational theory E = {fgfx ps gfgx} and T = 

^ The union of anything with T is T. 

® In a deterministic algorithm, choose would be replaced by a loop. 
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function solve{u ~ v) 

if M is a variable 
return \u i— >■ w] 
if w is a variable 
return [n i— >■ u] 

find s and t, a and p as in definition of inference rule 
if s and t are not unifiable 
return _L 

choose 8' in CSUe{s « i) 
for i = 1 to q 

8i = solve{ziO « Zip) 

6' = eiyj---yj8q 
for j = 1 to r 

9j = solve{yjp « yjcrO') 

8” = 8iU---u9r 
return 8' 9" 



Fig. 1. Algorithm 



{fx,gy,fgz,gfw}. In this example, after the inference rule, the right branch is 
always calculated first. That determines a unifier, which is applied to the left 
branch. Therefore, each left child is shown with the calculated unifier already 
applied. 

4 Decidability and Complexity 

We will prove that the size of the proof for u « u is bounded. The proof is defined 
as a tree of equations, with m « u at the root and for each node e, the children of 
e are obtained by our inference rule. As we explained in the algorithm. Mutate 
is applied as long as possible in a depth- first fashion, until we reach leaves of 
the form x k, t ov t k, x, where a; is a variable and t is any term. This defines 
the mgu 9i which is applied to the rest of the equations in the goal. The leaves 
are then counted as solved. Then another equation is selected and the process is 
repeated. The size of a proof is defined to be the number of non-leaf equations 
in the proof tree. We will show that if all non-constant function symbols are 
monadic, then the size of a proof tree of u « u is less than or equal to |u| x |w|. 

Theorem 3. Assume that E is a linear equational theory, containing only 
monadic function symbols, and that T is a finite E-closed set. The size of the 
proof-tree of a goal of varity 1, ups v, is less than or equal to |m| x |u|. If x and 





508 C. Lynch and B. Morawska 



///«i = 999gu2 



ffui = gffffz4 



fg^i = 999U2 



/“I - gff gggzs gz^ = fffz^ zi = fffz^ fgz2 = 



99U2 



ui = gffggffzi 2 

9^9 = fgggzs 

/ 

Z9 = fggffzi 2 



^5 = fgggzs \ Z 2 = ffz4 fgzs = gu 2 

gfz6 = ffzi / \ 



Z3 = fZ4 fgZ4 = U 2 



fzio = ggzs 



Z6 - ggzs gfzz = fz4 



210 = gffzi2 



fgzii = gzs 



27 = gzs gfzs = Z4 



211 = fzi2 fgzi2 = ZS 



Fig. 2. Proof Tree 



y are variables in u and v respectively, and 9 is a unifier of u and v obtained in 
the proof, then \x9\ < |r;| and \y9\ < |m|. 

Proof. The proof will be by induction on the sum of sizes of the terms in the 
equation u pz v, i.e., |u| + |w|. The base case is when |u| = 0 or |i;| = 0. In that 
case, M « n is in normal form.^ Therefore, the proof is of size 0, since we ignore 
leaf nodes in the tree. 

Now assume that |m| > 0 and |v| > 0. Assume that the theorem is true for 
each equation with sum of term sizes smaller than |w| + |i;|. First we must prove 
that induction is applicable, i.e. that the size is decreased with the application 
of the inference rule. An application of the rule with monadic terms will have 
the following form: 



sp[xu] ~ tp[yy\ 

XsP\xu] ~ Xscr[zi] yt<j[zi] « ytp[yv] 



9i 

XsP[xu\ ~ XsCr[zi]0i : new goal-equation 



02 

Since u ~ v has no repeated variables, it cannot be of the form x « w[a;] for some 
term w. 



4 
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where m « u is our goal, s,t € T, sp = u, tp = v, is the only variable in 
u, yy is the only variable in v, Xg is the only variable in s, yt is the only variable 
in t, zi is a variable possibly introduced by the unifier a of s[xs] and t[yt].^ 

In order to apply induction, we need to establish that the size of an equation 
gets smaller with the application of the rule. 

Claim. \yt(r\ + \ytp\ < |sp| + \tp\. 

Proof of Claim. \yt(j\ < 1, because ytcr G T or j/t is a variable. \ytp\ = \tp\ — 1, 
because by definition: \tp\ = 1 + \ytp\. Hence \yt(x\ + \ytp\ < 1 + \tp\ — 1 = \tp\ < 
\tp\ Cl |sp| + \tp\, because sp > 1 , since s is not a variable. □ 

Having proved this lemma, we can state, by the induction assumption, that 
the size of the proof-tree for ytu « ytp is less than or equal to \ytO'\ x \ytp\ < 
1 X {\tp\ - 1) = \tp\ - 1. Also, \zi9i\ < \ytp\ = \tp\ - 1 and |j/„ 6 »i| < \yt(j\ < 1, 
where 9i is the unifier obtained in the proof. 

Claim. \xsp\ + \xs<j9i \ < |sp| -I- \tp\. 

Proof of Claim. By the definition of the size of term: \xsp\ = |sp| — 1. (This is 
because: |sp| = 1-1- \x.sp\, where s is in T.) The size of the term: |xscr[ 2 ;i]di | = 
1 -I- 1^101 1, because Xscr GT or is a variable. We have shown that \z\9i \ < \tp\ — l. 
Hence, \xsCf9i\ < l + \tp\ — 1 = \tp\. Taking together the sizes of these two terms, 
we get: \x^p\ + \xg(j9i \ < |sp| -b \tp\ - 1 < |sp| -b \tp\. □ 

If follows from this claim that the size of the proof tree for XsP ~ Xs<j9i is less 
than or equal to \x.sp\ x \xs(j9i\ = (|sp| — 1 ) x \tp\. Also, |a;„ 6 * 2 | < \xscr9i\ < \tp\ 
and |x 2 ^i| < |a^sPl = |sp| ~ where Z 2 is a variable possibly introduced by the 
substitution 9\. 

Taking together these two statements, we can assess the size of the proof-tree 
for sp « tp. It is less than or equal to 1 -b \tp\ — 1 -b ((|sp| — 1) x \tp\) = |sp| x \tp\. 
Also, \xy9i92\ = |x„ 6 » 2 | < \tp\ and |j/„ 6 »i 6 » 2 | = \Vv9i[z2]92\ < \yy9i\ + |z 2 ^' 2 | < 
1 -b |sp| - 1 = |sp|. □ 

The theorem gives us the first major complexity result of the paper. 

Theorem 4. Let u zz v he a goal with no repeated variables. Let E he a linear 
equational theory, containing only monadic function symbols. Let n he the size 
of uzz V, defined in the standard way. Then 

— The nondeterministic algorithm in Figure 1 finds a set of E-unifiers for 
u zz V in nondeterministic time O(n^). 

— Any E -unifier that is constructed is of size 0(n). 

— Lf every pair of terms in T has a most general E-unifier, then the algorithm 
is deterministic, and runs in deterministic time 0{n^). 

® Technically, we need to show that the new equations generated are of varity 1. We 
show this in the full paper[8]. 
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In order to deal with the more general case of non-monadic terms, we will be 
considering height of a term and height of a proof-tree, in order to get an idea 
about the complexity of the procedure. The height of a term was defined earlier. 
The height of a proof tree is,the length of the longest branch in the proof-tree, 
excluding its leaf. We write the height of the proof-tree of u ^ v as H{u ^ v). 

The general case of the application of our rule is as in the following diagram: 



sp « tp 

ULi ^tP ~ xla 






y\p 



Ufci ^iP ~ xfa9i : new goal-equations 



92 

where m « u is our goal, s,tGT,sp = u, tp=v,Xi,---, are the variables 
in M, Vi, - ' ' iVn variables in u, xf , • • • , are the variables in s, y\, ■■■ ,yl 

are the variables in t, Z\, - ■ ■ Zp are variables possibly introduced by the unifier cr 
of s and t. 

Theorem 5. Assume T is a finite E-closed set. E is linear, and the goal u zz v 
is of varity 1, where u and v are not both variables. The height of a proof-tree 
of u zz V is less than or equal to H{u) H{v) — 1. If xf, - ■ ■ xlf, and yi, - ■ ■ ,yfi 
are variables in u and v respectively, and 9 is a unifier of u and v obtained in 
the proof, then H{xf9) < H{v) and H{yj9) < H{u). 

Proof. The proof will be by induction on H (u) -I- H{v). The base case is when 
H{u) = 0 or H{v) = 0. In that case m « u is in normal form. Therefore the proof 
is of height 0, since we ignore leaf nodes when calculating height. 

Now assume that H{u) > 0 and H{v) > 0. Assume that the theorem is true 
for each equation with sum of heights smaller then H{u) H{v). First let us 
consider the right equation: y\a zz yjp. 

Claim. H{y\a) + H{y\p) < H{sp) + H{tp) 

Proof of Claim. H{y\(j) < 1, because y\a is in T or is a variable. H{y\p) < 
Hftp) — 1, because, according to the definition of height, H{tp) = 1-1- 
max{H{ylp)}. Hence H{yla)-\-H{ylp) < l-\-H{tp)-l = Hftp) < H{sp)+H{tp). 
□ 



By the induction assumption, if H{ylcr « yip) yf 0, then H{ylcr « y^p) < 
H{yla) H{y\p) — 1, for every i G {!,■■■ ,r}. We know H{yla) < 1, and we 
know H{y\p) < Hftp) — 1. Hence, we know that the height of this proof-tree 
is: H{y\a « y\p) <1-1- H{tp) - 1 - 1 = H{tp) - 1. If H{y\a zz y^p) = 0, then 
H{y\(j « y\p) = 0 < H{tp) — 1, since H{tp) > 1. 

By induction we also know that: 
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- H{z,e,) < H{y\p) < H{tp) — 1 for each Zj in y*cr, 

— H{yj6i) < H{y*a) < 1 for each yj in y*p. 



Now, consider the left part of the proof-tree. 

Claim. H{xIp) + H{xfa9i) < H{sp) + H{tp) 

Proof of Claim. H{xfp) < H{sp) — 1, from the definition of height. 
H{xfa9i) < H{xfa) + max{H(zi9i)}, where {z\, - ■ ■ ,Zk\ are the variables 
in sa. max{H{zi9i)'\ < H{tp) — 1, from the analysis of the right equation. 
Hence H{xfa9i) <1-1- H(tp) — 1 = H(tp). Therefore, H{xfp) + H{xfa9i) < 
H{sp) — 1 -I- H{tp) < H{sp) + Pf{tp). □ 

Hence, by the induction assumption, we know that, if H{xfp « xfcr9i) yf 0, 
then H{xfp « xfa9\) < H{xfp) + H{xfa9i) — 1. Now, H{xfp) < H{sp) — 1, 
because according to the definition of height of a term, H{sp) = 1-1- max{xlp}. 
Also, H{x‘la[z\,- ■ ■ ,Zp\9i) < 1 + max{H{zi9i)} = H{tp), because H{zi9\) < 
H{tp) — 1, by the previous lemma. Hence, the height of this proof-tree will be: 

- H{xfp « xfa9i) < H{sp) — 1 -I- H{tp) — 1 = H{sp) + H{tp) — 2. 

If H{xfp « xfa9i) = 0 then H{xfp « xfa9i) < H{sp) + H{tp) — 2, because 
H{sp) > 1 and H{tp > 1). 

The induction assumption also states that 

- H{xJ 92) < H{xfa9i) <1-1- max{zi9i} <1-1- H{tp) — 1 = H(tp), for each 

in x'^p, and 

- H{zj92) < H(xfp) < H{sp) - 1, for all zl in xfa9i. 

We can now prove the main claim: 

The height of the proof-tree for uk, v, i.e. for sp « tp, is then: 

H{sp tp) < 1 + max{H{xfp « xfa9i), H{ylcr « yip)} < 1 + max{{H{sp) + 
H{tp) - 2), (H(tp) - 1)} = 1 -b H{.sp) + H{tp) - 2 = h{sp) + H{tp) - 1. This is 
because H{sp) + H{tp) — 2 > H{tp) — 1, because we assumed H{sp) > 0. 

We only need to prove the claims about the heights of terms: 

H{xj9i92) = H{x'j92), because Xj cannot be in the domain of 9i. By the 
assumption, iJ(x“ 02 ) < H{tp). 

H{y}9i92) < H{y}9i[z[, - ■ ■ , z}]) + max{H{z'j92)} < l + H{sp)-l = H{sp). 

□ 



This gives us the following complexity result. 
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Theorem 6. Let u v be a goal with no repeated variables. Let E be a linear 
equational theory. Let n be the size of v, defined in the standard way. Then 

— The nondeterministic algorithm in Figure 1 finds a set of E-unifiers for 
UK. V in PS PACE. 

— The terms in the range of the E-unifier that is constructed are of height 
0(n). 

5 Finding a Closed Set 

We have shown that once you have an _E-closed set, then unification problems of 
varity 1 are solvable, and we have given the complexity of the decision problem 
in several cases. That all assumes that we know of an if-closed set. That could 
be the case for some equational theories. But if we don’t know whether there is 
an if-closed set, then in this section we give a method to produce one which will 
work for some equational theories. 

First we show how to construct an if-closed set in an incremental way: 

Let To contain all terms of the form f(xi,---,Xn), where / is a function 
symbol of arity n > 0 appearing in E, and xi, - ■ ■ ,Xn are fresh variables. Also, 
To will contain two fresh constants c and d. 

For i > 0, Ti+i is defined as the set of terms such that t G T^+i if and only if t 
is a nonvariable such that there exists some u and v in Tj, a variable x appearing 
in u and a, a G CSUe{u k v) such that t is a renaming of a subterm of xa. 

Let T = Then T is an T-closed set if the complete sets of unifiers 

for pairs of terms in T are linear. Of course, T might not be finite. But if T 
is finite, then this gives us a decision procedure for solving the T-unification 
problem when the goal has no repeated variables. 

We still have not said how to find a complete set of T-unifiers for a pair of 
terms. This problem is undecidable in general, but in some cases it is possible to 
use a complete algorithm to generate the T-unifiers. One possibility is to use the 
complete procedure for linear equational theories presented in [7]. The inference 
system in that paper is a generalization of the General Mutate inference rules 
of [2, 3, 4,5], but it is complete for all linear equational theories. It uses a form of 
eager variable elimination which makes it more efficient. 

The problem with using a complete inference system is that it may not halt 
when two terms are not T-unifiable. However, we also need to check cases of 
non-unifiability for our algorithm. But, inference rules, such as the ones in [7] 
can be extended to detect non-unifiability in some cases where the procedure 
would normally not halt. The inference rules are goal directed, in the sense, that 
it begins with the equation which must be T-unified. As in the algorithm in 
this paper, an inference rule will be applied to the goal yielding one or more 
subgoals. Also, as in this paper, one or more rules may apply at each point. So 
the algorithm amounts to the simultaneous construction of one or more proof- 
trees. In some cases, it happens that every proof tree contains an equation u k v 
that is a descendant of a renaming of an equation s k t, such that sp = u and 
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tp = V for some p. In such cases, the algorithm will never halt, and therefore the 
initial equation is not if-unifiable. 

6 Conclusion 

Historically, much of the field of automated deduction has focused on inference 
procedures that search for a proof of a theorem, and not as much effort has been 
applied to finding methods of proving something is false. However, if these meth- 
ods can be applied to verification problems and other applications, we believe 
it is necessary to identify classes of problems where automated theorem provers 
will halt, and to understand the complexity of these classes. This is a goal of our 
research. 

The problems we considered in this paper are if-unification problems, since 
equational logic is useful for many applications. The procedure we give in this 
paper is an adaptation of a more general procedure for if-unification. However, 
on the class of problems we consider in this paper, we were able to show a 
measure on certain if-unification problems, such that the inference rules always 
reduce the measure; therefore it will halt and we can analyze how quickly it will 
halt, in order to examine the complexity. 

Specifically, we introduce a subclass of linear equational theories, called 
finitely closable. We consider goals with no repeated variables. We show that 
this class is solvable in PSPACE in general. For monadic theories, it is in NP. 
For unitary monadic theories, it is solvable in Ofnf). 

We think this class is interesting. We also think this research raises many 
questions to be explored further. Which equational theories are in this class? 
What is a good procedure for finding a finite (or recursive) F-closed set? Can 
our complexity results be made better? How can this class be expanded? 
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Abstract. Nontrivial meta-complexity theorems, proved once for a pro- 
gramming language as a whole, facilitate the presentation and analysis of 
particular algorithms. This paper gives a new meta-complexity theorem 
for bottom-up logic programs that is both more general and more accu- 
rate than previous such theorems. The new theorem applies to algorithms 
not handled by previous meta-complexity theorems, greatly facilitating 
their analysis. 



1 Introduction 

McAllester has recently shown that the running time of a bottom-up logic pro- 
gram can be bounded by the number of “prefix firings” of its inference rules [10]. 
A prefix firing of a rule is a derivable instantiation of a prefix of the antecedents 
of that rule. This single nontrivial meta-complexity theorem simplifies the pre- 
sentation and complexity analysis of a variety of parsing and static analysis algo- 
rithms. Many other algorithms, however, seem to fall outside of the range of this 
theorem. In particular, algorithms based on union-find or congruence closure can 
not be analyzed. A second meta-complexity theorem for the analysis of union- 
find based algorithms is also given in [10]. While this second theorem applies to 
a broader class of algorithms, it yields running time bounds that are often worse 
by logarithmic factors than algorithm-specific bounds — bounds proved without 
the use of a meta-complexity theorem. Here we prove a more accurate and more 
general meta-complexity theorem. The increased generality is achieved by prov- 
ing the theorem for logic programs with priorities and deletions. Priorities and 
deletions allow the simulation of arbitrary classical control structures. So the 
new meta-complexity theorem has, in some sense, universal coverage. The new 
theorem yields improvements in meta-complexity-derived bounds for a variety 
of algorithms including union- find and congruence closure. As an example, sec- 
tion 5 presents an algorithm for determining the satisfiability of a ground set of 
Horn clauses with equality. The new meta-complexity theorem allows the simple 
derivation of a very tight running time bound for this algorithm. Proving the 
same bound for this problem without the the use of our new theorem appears 
to be significantly more difficult. 
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2 Inference Rules with Priorities and Deletions 

We use the term inference rule to mean a first-order Horn clause, i.e., a formula 
of the form Hi A ... A — >■ C where C and each Ai is a first-order atom. We 
will use assertion to mean a ground atom and use the term data base to mean 
a set of assertions. If i? is a set of rules and H is a data base, then we let R{D) 
denote the set of ground atoms derivable from the ground set D using the rules 
in R. 

Here we are interested in expressing algorithms with prioritized inference 
rules with deletion. An inference rule with deletion is an expression of the form 
Hi A ... A H„ — >■ C where C is an atom and each H^ is either an atom or an 
expression of the form [A] where H is an atom. Intuitively, the marking [. . . ] 
means that the premise is to be deleted as soon as the rule is run. Deletion 
makes the behavior of the algorithm nondeterministic. For example, consider 
the following rules with deletion. 

P Q [Q] S' [Q]^W 

Suppose the initial data base contains only P. The first rule fires adding 
the assertion Q. Now either the second or third rule can fire. Since each of 
these rules deletes Q, once one of them fires the other is blocked. Hence the 
final data base is nondeterministically either {P, S} or {P, W}. When viewing 
rules with deletions as algorithms this nondeterminism is viewed as “don’t care” 
nondeterminism — the choices are made arbitrarily and not backtracked. (In 
many cases this kind of don’t-care nondeterminism can be justified by a suitable 
notion of redundancy, cf. Section 8.) Suppose now that we have additional rules 
through which W entails a large number of additional facts whereas the absence 
of W does not. Then, in order to obtain a more efficient run of the rules, we 
should prefer to fire the second rule rather than the third rule, which we could 
achieve by giving higher priority to the second rule. In summary, allowing for 
deletion makes deduction nondeterministic, and hence priorities are needed for 
indicating which choices are to be made in order to increase efficiency, or to 
avoid unwanted results. 

The proof of the meta-complexity theorem requires that deletion be perma- 
nent — once an assertion is deleted further attempts to reassert it have no effect. 
(If deletion is based on a notion of redundancy such as the one proposed in [2], 
once an assertion has become redundant it remains so for the remainder of the 
computation.) To see the problem with deletion consider the simple pair of rules 
[P] — >■ Q and [Q] — >■ P. If deletion is can be revoked by subsequent assertion, the 
rules can oscillate between a database containing P and a database containing Q 
and fail to terminate. To formalize this notion of permanent deletion we take a 
state of the computation to be a set S of literals (atoms and negations of atoms). 
The presence of a negated atom -iH in a state indicates that H should be consid- 
ered deleted. Hence, we say that an atom H is visible in a state S' if H G S' and 
-•A ^ S, while a negative literal -•A is called visible in S whenever -iH G S. If cr 
is a ground substitution and [H] is a deleted antecedent then we define cr([H]) to 
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be the ground atom cr(A). Now let S' be a state, and let r be an inference rule 
with deletions. We write S A S' if S S' and r is a rule A\ f\ ... f\ An ^ C 
such that there exists a ground substitution a defined on all the variables in 
r such that a{Ai) is visible in S, and S' is S U {a{C),~'a{Ai^), . . . 
where Ai ^ , ■ ■ ■ , Ai^ are the deleted antecedents of the rule. We say that a rule 
r is applicable at the state S if there exists a state S' (which must be different 
from S) such that S A- S'. 

Now let i? be a set of rules with deletions where each rule in R is associated 
with a positive rational number called its priority. We call R a rule set with 
priorities and deletions. For technical simplicity we may assume that priorities 
are unique in that no two rules have the same priority. We say that a state S is 
visible to a rule r G i? if no higher priority rule in R is applicable at S. We write 
S — >■ S' if there exists a rule r G R such that S is visible to r and S — >■ S'. We 
will say that a state S is saturated under R if it is a normal form, i.e., there is 
no S' such that S — >■ S'. An i?-computation from a database £> is a sequence Sq, 
Si, . . . , St such that Sq = D, St ^ St+i. An i?-computation is called complete 
if the final state St is saturated. If there is a complete ^-computation from D 
ending in St then we say that St is an i?-saturation of D. A rule set R is said 
to terminate on input database D if there is no infinite i?-computation from D. 

A prefix firing in an i?-computation C is a triple (r, a, i), where r G i? is a 
rule Ai A . . . A A„ C such that the computation C contains a state S visible to 
r and cr is a ground substitution defined on the variables in the antecedent prefix 
Ai, . . . , Ai such that the a{Aj), for 1 < j < i, are visible in S. Note that the set 
of prefix firings of a given rule is determined by the set of states visible to that 
rule. For any i?-computation C we let p(C) be the number of prefix firings in C. 
We will call a rule range-restricted if every variable in the conclusion appears in 
some antecedent. Bottom-up logic programs are generally range-restricted and 
for simplicity we only consider range-restricted rules. In the following, by \D\ we 
denote the size of a database which is the number of nodes in its fully shared 
graphical representation by a dag. 

Theorem 1. For any given set R of range-restricted rules with priorities and 
deletions there exists an algorithm mapping an input database D to an R-saturat- 
ion R{D) of D whose running time is 0(|I?|-|-maxcp(C)) where the maximization 
is over all R- computations C from D.^ 

The theorem extends the one in [10] to inference rules with priorities and deletion 
showing essentially that no penalty has to be paid for these extensions. The 
complexity can again be linearly bounded by the number of prefix firings. 

Before giving a proof of this theorem, in the next sections we will present a 
variety of applications. Before discussing those, the following example is given 
in order to clarify one of the more subtle issues behind our definitions. Consider 

^ Note that if there is no bound on the length of computations then the algorithm 
need not terminate. 
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the rules rl and r2, where 

rl : r{x,y), [p(a;)] ^ s(a;) r2 : p{x), q{x,y) ^ r{x,y) 

with priorities from left to right, on a database D consisting of facts p{i), for 
1 < z < n, and for 1 < i,j < n. In any computation from D, whenever 

rule r2 produces an r-fact in the next step rl takes priority over r2, and 

the p{i) is deleted so that no other fact r{i,j') can by produced thereafter. Hence 
any computation takes at most 2n steps. However, the number of prefix firings of 
rule r2 is rz-|-n^, and that is the upper bound on the time complexity provided by 
the meta-complexity theorem above. A more refined meta-complexity theorem, 
based on refined notions of prefix firings, could be stated. However in this paper 
we deliberately confine ourselves to the simpler version. The additional technical 
complexity does not appear to be required for the examples that we are interested 
in at present. 

3 A Union-Find Algorithm 

This section presents an 0(n log n) union- find algorithm given as a rule set with 
priorities and deletions. This union- find algorithm both gives an example of 
a use of theorem 1 and serves as a foundation for other algorithms given in 
later sections of this paper. The union-find algorithm in itself is perhaps not 
significantly simpler than classical presentations using pointers and recursive 
procedures. However its direct relation to the usual inference rules for Knuth- 
Bendix completion makes correctness arguments more straightforward. 

The union-find algorithm is used to represent equivalence relations. In the 
inference rule union-find algorithm U in Figure 1 we assume a binary predicate 
union such that the assertion union(x, y) means that x and y are to be made 
equivalent — the procedure is to compute the least equivalence relation such 
that if the data base contains union(x, y) then x and y are equivalent. The find 
function is defined in terms of a more basic rewrite relation which we represent 
here as a set of assertions of the form x ^ y. We define the “find” of x to be the 
normal form of x under the rewrite relation A- Storing this relation explicitly 
as assertions in the data base defines in a more logical manner what is usually 
implemented with pointer structures. 

The union-find inference system implements Knuth-Bendix completion for 
the simple case of equations between constants. The equations are represented 
by the union facts. The rules (FI) and (F2) compute the normal forms of terms. 
(U2)-(U4) orient equations into rewrite rules using an ordering that is dynami- 
cally determined by the weight computation in rules (U3) and (U4). If A* is the 
reflexive-transitive closure of A, the weight of y is the number of nodes x such 
that X A* y- An assertion a; A 2 / is to be added only for irreducible y for which 
y A! y- The rule (F1)-(F2) are to run at a higher priority than any other rules 
mentioning the predicates find, A or A- This ensures that, at any state visible 
to other rules mentioning these relations, the relation A! is the fixed “normal 
form relation” determined by the A relation. 
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find(a:) 

(FI) 

X X, 

weight (a;, 1 ) 



[x 4 -! y] 
y A- z 

(F2) 

X A'- z 



union(a;, y) 

(Ul) 

find(a;), 

find(y) 



[union(a;, y)] 
X A''- z 
y A'- z 

(U2) 

T 



[union(a;, y)] 

X 4-! «i 

y A'- Z2 

weight («i, wi) 
[weight («2,W2)] 

Wl < W2 

(U3) 

Zl A 22, 

weight (z2, Wl + W2) 



[union( 3 ;, y)j 
X A- Zl 

y 4 ! 22 

[weight (ai.wi)] 
weight ( 22, W2) 

Wl > W 2 
(U4) 

22 4 21 , 

weight (21, Wl + W2) 



Fig. 1. Module U for union-find. We write rules vertically as the antecedents followed 
by a horizontal line followed by the conclusions. The rules are listed in decreasing pri- 
ority. Multiple conclusions Ai, . . . , Ak should be viewed as a single atom a{Ai . . . , Ak) 
with auxiliary rules (of highest priority) generating the individual conjuncts Aj . 



All of the rules represented by (Ul) run at higher priority than (U2), (U3), 
or (U4). This implies that in any state visible to (U2), (U3), or (U4) we have 
that find(a:) and find(?/) have been asserted and hence the normal forms of x and 
y have been computed and the weights have been initialized. Rule (U2) is given 
higher priority than either (U3) or (U4). This implies that at any state visible 
to (U3) or (U4) we have that x and y have distinct normal forms (otherwise the 
state would be visible to rule (U2) which would then delete the link assertion 
and assert the trivial “true” assertion T). 

The use of addition in the rules (U3) and (U4) is outside of the formal 
language defined in section 2. However, the use of addition in the conclusion 
can be replaced by an additional final antecedent of the form W 3 = wi -I- W 2 - 
Theorem 1 can be generalized to handle constraint antecedents provided that 
the set of assignments of values to the unassigned variables (those not appearing 
in earlier antecedents) can be computed in time proportional to the number of 
such assignments. 

One should think of the rules (F1)-(U4) as a module U that takes as input 
assertions of the form find(a:) and union(a:,?/) and produces as output assertions 
of the form x A- z such that 2 is the normal form of cc in a canonical rewrite 
system generated from the equations union(a;,y). The input can be extended 
dynamically by new assertions of the form find(x) and union(cc, y) generated 
from additional rules that are compatible with U. 
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Definition 1. A module consists of a rule set with priorities and deletions plus 
specified input and output predicate symbols. All other predicates of the module 
are called local. A rule set R will be called compatible with a module M provided 
that it does not mention local predicates of M , and 

— no output predicate of M appears in any conclusion or deleted antecedent of 
a rule in R, 

— no input predicate of M appears in any deleted antecedent of R, 

— and every rule in R containing an output predicate of M in an antecedent 
has priority lower priority than all rules in M . 

An initial database D will be called compatible with a module M if it does not 
mention any predicates of M other than input predicates. 



Theorem 2. The union-find module U has the property that for any rule set 
R and initial database D, where R and D are both compatible with U , and any 
{R U U) -computation C from D, the total number of prefix firings in C of the 
rules in U is 0{m -\- nlogn) where m is the number o/ union assertions in C or 
produced by R, and n is the number of distinct terms appearing in union or find 
assertions. 

Proof. Note that each non-redundant union operation generates a single new 
assertion of the form x y where the weight of y prior to the addition of this 
assertion is at least as large as the weight of x. This implies that weight at least 
doubles as one moves across any assertion of the form x y. So for a given x 
the set of y such that x y can have at most log n elements. (At most n — 1 
rewrite rules x A^ y can be generated until all terms become equal.) 

Now we show that each of the rules in U has an appropriate number of prefix 
firings. The rule (FI) has at most n firings. It follows from the above comments 
that there are at most nlogn assertions ever generated of the form x -f>! y. This, 
and the fact that out-degree of A^ is at most one, imply that rule (F2) has at 
most nlogn prefix firings. Rule (Ul) has at most m firings. All states containing 
the assertion union(a;, y) and visible to any of the rules (U2), (U3), or (U4) 
must assign the same unique normal forms and weights to x and y. This implies 
that the rules (U2), (U3), and (U4) also have at most m prefix firings. 

It is possible to give a more complex inference rule implementation of union- 
find that runs in 0(na(n)) time where a is the inverse of Ackermann’s func- 
tion. However, many algorithms based on union-find run in O(nlogn) time even 
when using an 0{na(n)) implementation of union-find. For such applications an 
0(n log n) implementation of union-find suffices. 

4 Congruence Closure 

The congruence closure problem is to determine whether an equation s = t be- 
tween ground terms is provable from a given set of equations between ground 
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[init^(2)] 
z ^ w 

(C2) 

T 



find((®, y)) 

(Cl) 

find(*), find(j/), 
init^((a;, y)) 



[{x, y) ^ 2 ] 
X x' 

{x', y) z' 

(C4) 

union(2, z') 



[(®, y) z] 
X x' 

(C5) 

(*', y) ^ z 



[(a;, y) ^ z\ 

y 4 ! y' 

(x, y') ^ z' 

(C6) 

union(2, z') 



[init^(2)] 

(C3) 

z ^ z 



[(*, y) ^ 2 ] 
y 4 ! y' 

(C7) 

{x, y') ^ 2 



Fig. 2. Rules for congruence closure listed in order of decreasing priority 



terms using the reflexivity, symmetry, transitivity and congruence rules for equal- 
ity. Here we assume that expressions are represented using constants and a single 
pairing function. The congruence property of the pairing function states that if 
ui = wi and U 2 = UI 2 then (ui, U 2 ) = {wi, W 2 )- 

The inference rules in Figure 2 are compatible with the union- find module 
U (assuming that rules in U have priority lower than the rules (C4)-(C7) that 
use the output predicate 4! of U). Combined with U, they give an O(nlogn) 
algorithm for congruence closure. The rules are related to rules given in [3] which 
view congruence closure as a form of ground completion. The ternary atoms 
_ represent the signatures in [5], or the definitions in [3]. Note that, by 
contrast to [3], we do not introduce new constants to denote the subterms of 
the input equations. The terms on the right side in play the role of these 
constants. This explains rule (C3) where z 2 gives us z as the handle to all 
terms that are semantically equal to z. These rules have lower priority than all 
rules in the union-find module and the priority between rules corresponding to 
the order in which the rules are given with (Cl) having highest priority and 
(C7) having lowest. The precedence of (C2) over (C3), (C4) over (C6) and (C5) 
over (C7) ensure the invariant that for any pair (x, y) there is at most one 
assertion of the form (x, y) z. Furthermore, one can check that in any state 
visible to these rules, and hence where no union-find rules are still to be run, 
we have that if the state contains (x, y) ^ z then (x, y) and z have the same 
find value. Furthermore, the rules maintain the invariant that in states visible 
to (C4) through (C7), if the state contains find((x, y)) then there exists an x' 
and y' such that x 4! x' and y 4! y' and the state contains (x', y') => z where 
z is equivalent to (has the same find as) (x, y). In any final (saturated) state we 
must have that x' and y' are normal forms under the equivalence generated by 
the union assertions. This implies that in the final state, if we have find((xi, y\)) 
and find((x 2 , 2 / 2 )) where x\ and X 2 are equivalent and y\ and j /2 are equivalent 
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we must also have (/i, / 2 ) 2 : where fi is the common find of xi and X 2 , /2 

is the common find of yi and j/ 2 , and where z is equivalent to both input pairs, 
and hence the input pairs are equivalent to each other. 

The union-find module satisfies the condition that for any given x there are 
at most logn terms y such that x y. This implies that an initial assertion 
{x, y) ^ z can generate at most 2 logn “descendents” of the form {x' , y') z. 
This implies that at most 2nlogn assertions of this form are ever generated 
and this implies that each of the rules (C4), (C5), (C6), and (C7) have at most 
2nlogn prefix firings. The other rules have at most 0{n) prefix firings. Hence 
we have the following theorem. 

Let C be the module with input predicates union and find and output pred- 
icate ^!, resulting from combining the union-find module with the congruence 
closure rules. 

Theorem 3. If R is a rule set and D an initial database where both R and 
D are compatible with C then, in any computation from D of RU C, the total 
number of prefix firings of rules in C is 0{m + nlogn) where m is the number 
of input union assertions, that is, union assertions in D or generated by R, and 
n is the number of terms x appearing in find assertions ( either in input find 
assertions or find assertions generated by (Cl)). 

Note that in this case, n is proportional to the number of the different subterms 
in input find assertions. The complexity bound given by the theorem is the same 
as the one given in [5]. The latter paper, however, ignores the work needed for 
processing the input equations. Our inference rules do include this preprocessing 
and, therefore, come with an additional additive 0(m) term in the complexity 
bound. 



5 Satisfiability of Ground Horn Clauses with Equality 

We now extend congruence closure (in a compatible manner) to handle ground 
(object-level) Horn clauses represented as assertions in the input database D. 
(The meta-level Horn clauses are called inference rules.) More specifically we 
want to construct an algorithm for computing the deductive closure of a set of 
ground assertions of the form input (^ — >■ A) where the possible expressions for 
<I> and A are defined by the following grammar where c ranges over constants 
and p ranges over binary predicate symbols including the special symbol = for 
denoting formal equality. 

^::=H|^iA ^2 A :■.= T \ p{ti, t 2 ) t ::= c | (ti, ^ 2 ) 

The algorithm takes as input a set D of ground assertions of the form <P ^ A. 
The algorithm uses the congruence closure module and all inference rules in this 
section run at priority higher than those in the congruence closure module. 

We start with the following linear time module for ground Horn clauses 
without equality. The module may be viewed as a high-level implementation of 
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the algorithm in [4]. The main idea in this set of rules is that atoms appearing 
in the antecedent of clauses are first detached from their clauses. This has the 
effect that even if an atom has many occurrences in antecedents of clauses it is 
nevertheless only derived once. 



input(^ — >■ A) antecedent(^i A ‘P 2 ) 

(ID (12) 

antecedent (^), conclusion(yl), antecedent (^ 1 ), antecedent(^ 2 ) 

true(45 — >• A) 



(13) 

true(T) 



true(^ — >■ '!') 
true(^) 

(14) 

true(^) 



antecedent (^1 A ^ 2 ) 
true(<?>i), true(<? 2 ) 

(15) 

true(<?i A $ 2 ) 



A natural way to extend these rules to handle equality would be to treat 
atoms themselves as terms and apply congruence closure. This would give a 
simple O(TOlogm) algorithm for conditional equations, where m is the size (in 
dag representation) of the set of clauses. If m is quadratic in the number n 
of different terms appearing in the set, that would give the bound O(n^logn). 
However, for the particular application given in section 6 a more refined bound, 
and more refined algorithm, is needed. We handle equality with the following 
rules. 



true(=(s, t)) 

(16) 

union(s, t) 



antecedent (p(s, t)) 

(17) 

find(s), find(t), 
push(p(s, t)) 



conclusion(p(s, t)) 

(18) 

find(s), find(t), 
push(p(s, t)) 



[push(p(s, t))] 
s s' 

(19) 

true(p(s', t) -^p{s, t)), 
true(p(s, t) p{s', t)), 
push(p(s', t)) 



[push(p(s, t))] 
tM t' 

( 110 ) 

true(p(s, t') p{s, t)), 
true(p(s, t) -s- p{s, t')), 
push(p(s, t')) 



[push(=(s, s))] 

( 111 ) 

true(=(s, s)) 



The rules have priority in the order given with (II) having highest priority 
and (111) having lowest but with all rules at lower priority than any rules in 
the congruence closure module. We leave it to the reader to verify that any 
saturation of these rules contains a given conclusion if and only that conclusion 
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follows from the input under the standard interpretation of equality. Here we 
focus on the run time (number of prefix firings) of these rules. 

Let m be the number of antecedents as derived by rules (II) and (12), plus 
the number of clauses in the input, and let n and a, respectively, be the number 
of different terms and atoms appearing there. Clearly, a is in 0(m), and also 
in O(n^), with m,n < \D\. The number of prefix firings of rules (II), (12), (13), 
(15), (17), and (18) are all proportional to m. The number of prefix firings of (14) 
is proportional to m plus the number of firings of (19) and (110). The number of 
prefix firings of (16) is proportional to m plus the number of firings of rule (111). 
The number of prefix firings of (111) is bounded by to the number of prefix firings 
of (17) and (18) (which is m) plus the number of prefix firings of (19) and (110). So 
the total number of prefix firings is proportional to m plus the number of firings of 
the two rules (19) and (110). Since has out-degree at most one we immediately 
get that these two rules have at most firings where n is the number of subterms 
appearing in the input. By the properties of union- find we also get that those 
rules have at most alogn firings. The number of union operations generated by 
rule (16) is at most alogn, hence in O(TOlogn). The number of prefix firings 
inside the congruence-closure module is therefore 0((m -|- n) log n). So the total 
number of prefix firings is 0{m + nlogn -|- min (m log n, n^)). 

Theorem 4. Satisfiability of ground Horn clauses with equality can he decided 
in time 0{\D\+nlogn + mm{mlogn, n^)) where m is the number of antecedents 
and input clauses and n is the number of terms. 

The above bound is better than 0{mlogm) in any family of problems where m 
is f7(n^). In that case, also \D\ is in Q{n^) and the algorithm becomes linear in 
the size of the input. In cases where the length of antecedents is bounded by a 
constant, m is proportional to the number of input clauses in D. 



6 Henglein’s Quadratic Typability Algorithm 

Following the exposition by [10], the typability problem in a variant of the Abadi- 
Cardelli object calculus [1] considered by [8] can be taken to consist of a given 
set of assertions of the form a <t and accepts{a, 1) and notaccepts{a, 1), where 
(7 and T are type names and I is a message name. The instance is acceptable 
(solvable) provided that the following rules do not derive fail. We also assume 
that type{a) is derivable for those type terms cr that appear in [nof\accepts- or 
<-facts in the input, or which are of the form t.I with accepts^r, 1) appearing in 
the input. Moreover, we assume the standard reflexivity, symmetry, transitivity, 
and congruence properties of equality. 

a < T accepts{a, 1) type{u) 

type{p) accepts{T,l) accepts{a,l) 

fu-np(a) '^-P (tCt notaccepts{a,l) 



(T T cr 



cr C p 



a.l = T.I 



fail 
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To determine solvability of a problem instance we can now simply build all 
ground Horn clauses that result from resolving the bodies of the first three rules 
with the [not]accepts- and the <-facts in the input, and also with the the type- 
facts derived from the input database. For the last rule we generate the ground 
instances by resolving with the type-facts and instantiating with all label terms 
in the input. If the size of the problem instance is m, the input contains 0{m) 
accepts- and input-iacts and terms. Hence we obtain 0{rn?) resolvents which 
are ground clauses in which 0(m) terms appear. We now have that the input is 
solvable if and only if one cannot derive fail from these ground clauses together 
with the facts in the input. ^ Applying theorem 4 we get a novel and simple proof 
of Henglein’s result that solvability is decidable in 0{mf) time. 

7 Proof of the Meta-complexity Theorem 

In this section we prove Theorem I. It turns out to be convenient to prove the 
theorem for a slightly more general language. We define a literal-based rule to 
be a rule of the form Ai A • • • A A„ — >■ C where each Ai and C are literals, i.e., 
either atoms or negations of atoms. We write S' A S" if there exists a ground 
substitution a defined on all the variables in r such that for each antecedent 
A of r we have that a (A) is visible in S and S' is S U {cr(C)} where C is the 
conclusion of r. We then define the notion of a rule is applicable to a state, a 
state being visible to a rule, an i?-computation, and an i?-saturated state, and a 
prefix firing as in the case for rules with deletions (in both cases we allow rules 
to have priorities). We now show that any rule set with priorities and deletions 
can be translated to a literal-based rule set in a way that allow computations 
to also be translated in a way that preserves the number of prefix firings up to 
a constant factor. In particular, we translate a rule with deletions of the form 
Ai A • • • A An — >■ C to the following set of literal-based rules where p is a fresh 
predicate symbol, xi, . . . , x„ are all variables in the rule, and Ai^, . . . , Ai^, are 
all the deleted antecedents. 



Ai A • 


-AAn p{xi, . . 


• 7 Xn) 


p{xi, .. 


■ 5 ^n) ^ 




p{xi, .. 


■ 5 Xji) y 




p{xi, .. 


■ 5 Xn) — >■ C 





The first rule above has the same priority as the translated rule. The other 
rules are called “transient” rules. Note that an “atomic” invocation of one of the 
original rules gets translated into a sequence of intermediate states where some, 

^ One needs to show that further ground clauses are redundant. Suppose, for instance, 
we have accepts{s, 1) and acceptsit, 1) in the input. Then by the process just described 
we generate the ground clause s IZ t ^ s.l = t.l. If we later derive s = s' , the clause 
s' G t — >• s' .1 = t.l is a consequence of the clause we already have. 
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but not all, of the deletions and insertions have been made. We must ensure that 
these “transient” states are not visible to other rules in the system. This can 
be done by assigning all transient rules higher priority than all rules in the set 
being translated. It now suffices to prove theorem 1 for rules over literals rather 
than rules with deletions. 

We now perform source to source transformations on rules over literals to put 
the rules in a simplified form without increasing the number of prefix firings by 
more than a constant factor. First we convert rules such that they have at most 
three literals. If r is a rule over literals A1AA2A. . .AA„ — >■ C with n > 2 then we 
replace r by the following set of rules where P\, P2, ■ ■ ■ Pm are fresh predicate 
symbols and xi, . . . , a;^. are the variables occurring in the first i antecedents. 
The predicate Pi represents the relation defined by the first i antecedents, and 
-•Pi represents the negation (retraction) of Pi. 

Ai Pi(a:i, . . . ,XkJ 
~'Ai — >• ->Pi (xi , . . . ,Xki) 

Pi(xi, . . . A A2 P2(a:i, . . . 

P 2 (xi , . . . ,XfcJ A -■Pi(a;i, ... ,XfeJ -■P2(a;i, . . . 

P 2 (xi , . . . ,Xk 2 ) A -1^2 -'P2(a;i, . . . 

Pn—l (^1 , • ■ • j ) A y Pji (xi , . . . , Xjz2 ) 

Pn (^1 j ■ ■ ■ 5 ) A “'Pn—l (^1 j • ■ ■ 5 ^ ~'Pn (^1 j • ■ ■ j ) 

Pn (^ 1 5 ■ ■ ■ ; ^k 2 ) A y ~'Pn (^1 j • ■ ■ , ^k 2 ) 

Pn (^1 j ■ • ■ , ^ C' 

Since we are now proving a version of theorem 1 for rules over literals we must 
consider the case where the rule being translated has negative antecedents. In 
that case the above rules might include rules with doubly negated antecedents. 
Such rules are simply dropped from the translation since negative literals can 
not be deleted (or overruled). The last rule above is given the same priority as 
the rule being translated. All other rules are given higher priority (but lower 
than the priority of any original rule with priority higher than the rule that is 
translated) where the priority is in the order given, i.e., the first rule has highest 
priority and so on. This is possible since the original rules have all different 
priorities. 

To prove the version of theorem 1 for rules over literals it suffices to show that 
any computation of the translated rule set can be mapped back to a computation 
of the original rule set with no more than a constant factor reduction in the 
number of prefix firings. In particular, if the new rule set derives Pi{xi, . . . , Xki) 
then there must exist a single state in the computation of the original rule 
set where Ai, ..., Ai all hold under the corresponding variable substitution. 
This follows from the observation that the priority assignment guarantees that 
in states visible to the rule deriving Pi{xi, ... , Xk^) the predicate Pi-\ is 
guaranteed to have the appropriate meaning as a function of the predicates used 
in the original antecedents. 
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We have now shown that we can assume without loss of generality that each 
rule contains at most two antecedents. We now put the rules in an even more 
restricted form. For any rule r with two antecedents A A2 — >■ C we replace 
r by the following set of rules where x\, . . . , are all variables occurring in 
A\ but not A2, t/i, . . . , ym are all variables that occur in both A\ and A2, and 
Zi, . . . , Zk are all variables that occur in A2 but not Ai. The predicates P, and 
Q, and the function symbols /, g, and h are all fresh. 

Ai P(f(xi, , x„), g{yi, ... , y^)) 

-■Ai -nP{f{xi, ... , Xn), g{yi, ... , ym)) 

A2 Q{g{yi, ... , ym), h{zi, ... , Zk)) 

mA2 -)> -^Q{g{yi, ... , ym), h{zi, ... , Zk)) 

P{f{xi, ... , Xn), g{yi, ... , ym)) A Q{g{yi, ... , ym), h{zi, ... , Zk)) C 

The rules are given priority in the order given with the last rule having the same 
priority as the rule being translated. Again the validity of the translation relies 
on the observation that in any state visible to a rule using one of the newly 
introduced predicates P and Q in antecedents, these predicates must have the 
intended meaning as a function of the underlying original predicates and hence 
there is a corresponding firing of the original rule. 

Without loss of generality we now need only prove the theorem for prioritized 
inference rules over literals where each rule either has only a single antecedent 
or is of the form P(ti, 12) A <5(^2, ^3) — >■ C where ti, t2, and tz do not share 
variables and where the rules maintain the invariant that for all derivable ground 
assertions of the form P(si, S2) we have that si is a substitution instance of t\ 
and S2 is a substitution instance of S2, and for all derivable ground assertions of 
the form Q{s2, S3) we have that S2 is a substitution instance of ^2 and S3 is a 
substitution instance of t^. For such rule sets we can use the algorithm shown 
below to compute an i?-saturation of a given initial database D. 

Algorithm to Compute R{D): 

Assume that D is in fully shared dag representation in which term equality can 
be checked in constant time. We maintain queues Qp and Rp for each priority 
p in R. Initialize S' to be I? and place every element of D on every queue Qp. 
Initialize all queues Rp to be empty. 

While some queue is nonempty do the following: 

Let p be the highest priority such that either Qp or Rp is nonempty. 
(The current state is visible to rules of priority p.) If Qp is nonempty 
then remove a literal <P from Qp and if is visible in S then notice at 
priority p using the procedure given below. If Qp is empty then remove a 
pair (r, a) from Rp. (Here r is a rule of priority p and cr is a substitution 
assigning ground values to all variables of r such that for each antecedent 
of A of r we have a {A) G S.) If a {A) is visible in S for each antecedent 
A of i? then let P be the assertion cr(C') where C is the conclusion of R, 
add 'P to S, and place P on all queues of the form Qp,. 
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Algorithm to Notice at priority p: 

(The current state is visible to all rules of priority p and <P is visible in S.) 

1. For each single-antecedent rule of priority p of the form A — >■ C determine 
whether there is a substitution cr such that cr(A) = and, if so, add the pair 
(r, a) to Rp. 

2. For each two-antecedent rule of priority p of the form P(ti, t 2 )/\Q{t 2 , ts) — f 
C do the following: 

(a) If <1> has the form P(si, S2) then for each S3 such that <5(s2, S3) is visible 
in S add the pair (r, cr) to Rp where u is the substitution mapping t\ to si, 
t 2 to S2, and ^3 to S3. (We are guaranteed that ti, ^2, and do not share 
variables and that matches si, t2 matches S2, and matches S3.) 

(b) If has the form Q(s2, S3) then for each si such that P(si, S2) is 
visible in S add the pair (r, cr) to Rp where a is defined as in (a). (Analogous 
guarantees also exist in this case.) 

We leave it to the reader to verify the correctness and running time of this 
algorithm. The main feature of the algorithm is that the processing of a given rule 
r is restricted to states visible to r. By incrementally maintaining appropriate 
indices it is possible to run steps (2a) and (2b) in time proportional to the 
number of values of S3 and si defined in those steps respectively. For instance 
the set of substitutions S3 such that Q(s2,S3) is visible in S (cf. step 2a) has 
to be indexed using term S2 as key, so that upon adding a new Q-atom to S 
the index can be updated in constant time. Note that since we have assumed 
dag representations for expressions under which equality testing is a unit time 
operation, all matching operations for patterns in R take unit time. 



8 Future Work 

We have presented a more refined concept of logic programming where the em- 
phasis is on guaranteed execution time bounds linear in the number of prefix 
firings of its rules. We are optimistic that this logic will continue to prove itself 
useful in the design of algorithms. We have demonstrated some of the potential of 
the method by giving a novel algorithm for testing satisfiability of ground Horn 
clauses with equality and shown that in many cases its complexity is not worse 
that (unconditional) congruence closure. On top of this we have implemented 
an abstract version of Henglein’s type analysis, confirming the quadratic upper 
bound that Henglein obtained before. In fact we believe that program analysis is 
a particularly fruitful area for applying our method. This point was illustrated 
in detail in [10]. With the methods in the present paper we are able to also deal 
with congruences that appear in such analyses in a logical way. 

There are many directions into which this work should be extended. Theo- 
rem 4 should be generalized to cases of input clauses with variables. However, 
already in the given form it is useful for local (equational) theories in the sense 
of [7,9,6]. 
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The relation between deletions and redundancy elimination, with redundancy 
in the sense of [2] as entailment from smaller atoms, should be explored. For 
instance, the rule (19) deletes push(s,t) if s can be reduced to s' after the re- 
duced atom push(s',t) has been generated. The deleted atom is “entailed” by 
the “smaller” assertions push (s', t) and s s'. The elimination of such redun- 
dancies is stable under enrichments of a state or deletions of other redundant 
atoms. Therefore, if only redundant premises are deleted, priorities are irrelevant 
for the correctness of the algorithm and only affect its complexity. 

The concept of priorities for rules should be refined in an instance-based 
manner, allowing different instances of a rule to have different priorities. That 
would give one direct means of formalizing algorithms that would normally have 
to be defined via priority queues. For instance, minimal spanning trees can be 
computed by a two-rule program on top of union-find if rules referring to edges 
in graphs could be processed in an order related to their associated costs. 
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Abstract. Canonical propositional Gentzen-type systems are systems 
which in addition to the standard axioms and structural rules have only 
pure logical rules which have the subformula property, introduce exactly 
one occurrence of a connective in their conclusion, and no other occur- 
rence of any connective is mentioned anywhere else in their formulation. 
We provide a constructive coherence criterion for the non-triviality of 
such systems, and show that a system of this kind admits cut elimina- 
tion iff it is coherent. We show also that the semantics of such systems 
is provided by non-deterministic two- valued matrices (2-Nmatrices). 2- 
Nmatrices form a natural generalization of the classical two-valued ma- 
trix, and every coherent canonical system is sound and complete for one 
of them. Conversely, with any 2-Nmatrix it is possible to associate a co- 
herent canonical Gentzen-type system which has for each connective at 
most one introduction rule for each side, and is sound and complete for 
that 2-Nmatrix. We show also that every coherent canonical Gentzen- 
type system either defines a fragment of the classical two-valued logic, 
or a logic which has no finite characteristic matrix. 



1 Introduction 

There is a long tradition starting from [Gen69] according to which the meaning 
of a connective is determined by the introduction and elimination rules which are 
associated with it.^ The supporters of this thesis usually have in mind Natural 
Deduction systems of an ideal type. In this type of “canonical” systems each 
connective has its own introduction and elimination rules, which should meet 
the following conditions: in a rule for a connective o, this connective should 
be mentioned exactly once, and no other connective should be involved. The 
rules should also be pure (in the sense of [Avr91]). Unfortunately, already the 
handling of classical negation requires rules which are not canonical in this sense. 
This problem was solved by Gentzen himself by moving to what is now known as 
Gentzen-type systems or sequential calculi. These calculi employ in their classical 
version multiple-conclusion two-sided sequents, and instead of introduction and 
elimination rules they use left introduction rules and right introduction rules. 
The intuitive notions of “canonical form of a rule” and “canonical system” can 

^ See e.g. [Hod86] and [Sun86] for discussions and references. 
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be adapted to such systems in a straightforward way, and it is well known that 
the usual classical connectives can indeed be fully characterized by canonical 
Gentzen-type rules. Moreover: although this can be done in several ways, in all 
of them the cut-elimination theorem obtains. 

In this paper we shall considerably generalize these known facts. We shall 
define “canonical” Gentzen-type rules and systems in precise terms, and provide 
a constructive coherence criterion for their non-triviality. We then show that a 
canonical system admits cut-elimination iff it is coherent. Moreover: we show that 
any coherent set of canonical introduction rules for a connective o completely 
determines the meaning of o. For this we shall need however to generalize the 
usual semantics of classical logic. 

The structure of the rest of this paper is as follows: In section 2 we re- 
view some basic concepts related to logics. In section ?? we define canonical 
Gentzen-type systems, formulate the coherence criterion for their non-triviality, 
and investigate some special important types of them. In section 4 we introduce 
non-deterministic two- valued matrices (2-Nmatrices). These are a generalization 
of the classical two-valued (deterministic) matrices, and they provide the se- 
mantics of coherent canonical Gentzen-type systems. In the same section and in 
section 5 we show how to associate a 2-Nmatrix with every coherent canonical 
system G, so that G is sound and complete for that 2-Nmatrix. This allows us to 
prove that every system of this type admits cut elimination, and it defines either 
a fragment of the classical two-valued logic, or a logic which has no finite charac- 
teristic matrix. In section 6 we show that the connection works also in the other 
direction: with any 2-Nmatrix it is possible to associate a coherent canonical 
Gentzen-type system G which is sound and complete for that 2-Nmatrix. More- 
over: for this we can confine ourselves to systems which have for any connective 
at most one left introduction rule and at most one right introduction rule, and 
these rules can be given a particularly concise normal form. We conclude the 
paper with some remarks and directions for further research. 

2 Preliminaries 

In what follows £ is a propositional language with a finite set of connectives, W 
is its set of wffs, denote arbitrary formulas (of £), and F,A denote sets 

of formulas. We assume also that the atomic formulas of £ are pi,p 2 , - ■ ■ 

Definition 1. 

1. [Sco74] A (ScottJ consequence relation (scr for short) for L is a binary 
relation h between sets of formulas of L that satisfies the following conditions: 

s-R strong reflexivity.' if F (1 A ^ then F A. 

M monotonicity.' if F \- A and F C F' , A C A' then F' \- A' . 

C cut.' if F \- Ip, A and F' , ip\- A' then F, F' h A, A' . 

2. h is finitary if the following condition holds for all F,AC W: if F \- A 
then F' h A' for some finite F' C F and A' C A. h is uniform if for every 
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uniform substitution a and every F and A, if F h A then <j{F) h a{A) . h 
is consistent (or non-trivialj if there exist non-empty F and A s.t. F \/ A. 

3. A propositional logic is a pair where L is a propositional language 

and \- is a uniform consistent scr for C. 

Note: There are exactly four inconsistent finitary scrs in any given language: 
the one in which _T h Z\ iff T and A are non-empty; the one in which F \- A 
iff F is non-empty; the one in which T h Z\ iff Z\ is non-empty; and the one in 
which F \- A for all F and A. All of them should be considered trivial, and are 
excluded from our definition of a logic. 



3 Canonical Gentzen-Type Systems 

Definition 2. 

F A Gentzen-type system G is standard if its set of axioms includes the stan- 
dard axioms F,ip ^ -ip, A and it has all the standard structural rules (includ- 
ing cut). ^ 

2. Let G be a standard Gentzen-type system. The scr he which is induced by 
G is defined by: F he A iff the sequent F ^ A is provable in G. 

3. A standard Gentzen-type system G is consistent z/Pg is consistent. 

From now on by a “calculus” we shall mean a standard Gentzen-type 
calculus, and F and A will denote finite sets of formulas. 

In an ideal Gentzen-type system (of which the usual systems for classical 
logic provide the principal examples) every logical rule should be an introduc- 
tion rule for one connective, it should introduce exactly one occurrence of that 
connective in its conclusion, and no other occurrence of that connective or any 
other connective should be mentioned anywhere else in its formulation. More- 
over: the rule should be pure (i.e., there should be no side conditions limiting 
its application), and its side formulas should be immediate subformulas of the 
principal formula. The next definition formulates this idea in exact terms, and 
provides a method for describing such rules. 

Definition 3. 

1. A canonical rule of arity n is an expression of the form {Ili Ei}i<i<m/C , 
where m > 0, C is either o{pi,p 2 , ... ,Pn) ^ or ^ o(pi,p 2 j • • • ,Pn) for 
some connective o (of arity n), and for all 1 < i < m, Ili ^ Ei is a clause 
such that Ili, Ei C {pi,p 2 , . . . ,Pn}-^ 

^ This means that we can take F, A m a sequent F ^ A to be finite sets of formulas. 
® By a clause we mean a sequent which consists of atomic formulas only. When propo- 
sitional clauses are written in this way, resolution and cut amount to the same thing. 
{pi,P 2 , . . . ,Pn} are, recall, the first n atomic formulas. 
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2 . An application of a canonical rule {Ili / o (pi, ■ • ■ iPn) is 

any inference step of the form: 

{rj,n* Ai, 

where U* and E* are obtained from Ui and Ei (respectively) by substituting 
ifj for pj (for all 1 < j < n), Fi, Ai are any sets of formulas, F = [J(fi Fi, 
and A = Ai. An application of a canonical rule with a conclusion of 
the form o(pi, • ■ • ,Pn) is defined similarly. 

Note: While sequents are written in a metalanguage for C (which includes the 
extra symbol =^«), a canonical rule is formulated in a meta-meta language of C 
(which includes one further extra symbol: /). 

Example F The two usual introduction rules for the classical conjunction can 
be formulated as the following canonical rules: {pi , _P2 } / Pi A p2 and 

{ Pi , P2} / Pi A p2- Applications of these rules have the form: 

F,‘if,(j) ^ A F A,ip F' A' , (j) 

Fj'ip A 4> ^ A F, F' A, A' , if A (j> 

Definition 4. A standard calculus is called canonical if in addition to the stan- 
dard axioms and the standard structural rules it has only canonical logical rules. 

A given canonical calculus may be simplified in various ways. In later sections 
we will go deeper into questions concerning simplifications and normalizations 
of rules and calculi. For our immediate purposes we shall need only very obvious 
simplifications: 

Definition 5. A canonical rule is called superfluous if its set of premises is 
classically inconsistent (which is the case iff it is possible to obtain the empty 
clause from it using resolutions (= cuts)). A logical rule in a canonical calculus 
G is called redundant in G if its set of premises is a superset of the set of 
premises of another rule of G which has the same conclusion. 

Example 2 . A rule with the set of premises {pi,P2 5 Pi P2 > Pi} 
is superfluous. If a calculus G has the two rules { pi}/ o (pi,p2) and 

{ Pi , P2}/ o (PI7P2) then the latter rule is redundant in G. 

Proposition 1. Let G be a canonical calculus, and let G' be the calculus that 
is obtained from G by deleting superfluous and redundant rules. Then G' is 
equivalent to G. Moreover, every sequent that has a cut-free proof in G' also has 
such a proof in G. 

Proof. An application of a superfluous rule in G can be simulated in G' by using 
cuts on its premises followed by a single weakening. The rest of the proposition 
is trivial. 
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For every canonical Gentzen-type system G, the relation he (see Definition 
2) defined by G is obviously a uniform (and finitary) scr. However, in order to 
ensure that (£, he) is a logic, we need to impose some constraints on the set 
of rules of G. The following definition provides a constructive equivalent of the 
consistency condition: 

Definition 6. A canonical calculus G is called coherent, if for every two rules 
S \/ o {pi,p 2 , ■ ■ ■ ,Pn) and S 2 / ^ o{pi,p 2 , ■ • • ,Pn) of G, the set of clauses 
Si U S 2 is classically inconsistent (and so the empty clause can be derived from 
it using cuts). 



Example 3. The two classical rules for conjunction described in Example 1 form 
a coherent set of rules. Here = {pi,P 2 }, «5'2 = { Pi , P 2 } and so 

S'! U S 2 is the classically inconsistent set {pi,P 2 => , Pi , P 2 }- 



Example 4- Let T be the famous “Tonk” connective of Prior ([Pri60]). It is de- 
fined in our framework by the following pair of rules: {pi } / PiTp 2 and 

{ P 2 } / ^ PiTp 2 - This pair is not coherent, since {pi P 2 } is a classi- 

cally consistent set of clauses. The resulting calculus is of course inconsistent. 



Proposition 2. Every consistent canonical calculus is coherent. 

Proof. Suppose there are two rules S'i/o(pi, . . . ,p„) and S 2 / => o(pi, • ■ • ,Pn) 
such that S'! U S 2 is classically consistent. Then there is a classical valuation v 
that satisfies Si U S 2 . Let H' = {pi \ 1 < i < n, v{pi) = t} and 27' = {pi | 1 < 
i < n, v{pi) = /}. Let S) = {II, 77' 27, 27' | 77 27 G Sj} for j = 1, 2. S[ and 

S '2 are sets of standard axioms (because v satisfies II ^ S, there is some Pi £ E 
such that v{pi) = t or some Pi £ II such that v{pi) = f. In the former case. 
Pi G 77', and in the latter case, pi G 27'). By applying the first rule on we 
obtain 77',o(pi, . . . ,p„) 27' and by applying the second rule on 52 we obtain 

77' 27',o(pi, . . . ,p„). By cut, 77' 27' is provable. Since 77' 27' is a clause, 

77' n 27' = 0, and the calculus is uniform, p ^ q is provable for all p ^ q. The 
uniformity of the calculus and the closure under weakening entail that E ^ A 
is provable for every non-empty E and A. Hence the system is not consistent. 

The converse of this proposition will be shown in Corollary 2 below. Note 
that coherence is also a necessary condition for cut elimination: 

Proposition 3. A canonical calculus which admits cut elimination is coherent. 

Proof. In canonical systems, clauses which are not axioms can be proved only 
by using cuts on non-atomic formulas. Thus, if a canonical calculus admits cut 
elimination it must be consistent and hence coherent. 
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Definition 7. 

1. A canonical rule of arity n is called separated if each of its premises is a 
unit clause, i.e. it has the form Pi ^ or ^ pi for some 1 < i < n. 

2. A separated rule {Ui of arity n is called full if m = n and 

for every 1 < i < n, UiU Si = {pi} 



Example 5. {pi,P2 } / Pi Ap2 is not separated. {p\ } / pi Ap2 is 

separated, but not full. {p\ ^ P2} ! Pi ^P2^ is full. 



Definition 8. A canonical calculus is called separated (fullj if all its logical 
rules are separated (full), and none of them is superfluous or redundant. 

Note that a full canonical calculus G is coherent simply iff no two rules of G 
for the same connective have the same set of premises but different conclusions. 

As a first step in proving our general cut-elimination theorem we shall show 
that it suffices to prove it for full canonical calculi: 

Proposition 4. Every canonical calculus G has an equivalent full canonical 
calculus G'. Moreover: if G is coherent then so is G', and every sequent that 
has a cut- free proof in G' has also a cut- free proof in G . 

Proof. We shall first explain the process of transforming a canonical calculus 
to an equivalent full canonical calculus by using an example. The transition is 
made in two stages: first, every canonical rule is split into separated rules, and 
then each of these rules is split into full rules. 

Example 6. Take the classical introduction rules for conjunction: 

[A {pi,p2 ^ } / Pi Ap2 ^ {^Pi, ^ P2} / ^PiAp2 A] 

The second rule is already full. The first is neither full nor separated. We can 
replace it by the following pair of separated rules: 

{pi ^ } / Pi A P 2 ^ {P 2 ^ } / Pi A P 2 ^ 

This pair is equivalent to the original rule. Indeed, given the two rules, by ap- 
plying the first to r,ipi,-ip2 ^ we obtain r,ipi A ip2,'f’2 ‘A. By applying 

the second to this sequent we obtain T, V'l A 1(2 ^ A. The other direction of 
the equivalence is obvious in view of weakening. Moreover: a given cut-free proof 
that uses the new rules can trivially be transformed into a cut-free proof that 
uses the original rule. 

Next, the first of the two new rules can again be replaced by the following 
pair of full rules: 



{Pi ^ , P 2 ^ } / Pi A P 2 ^ 



{pi ^ ^ P2} / Pi Ap 2 ^ 
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This pair is equivalent to the original (separated) rule. Indeed, using the two 
new rules, the original one can be simulated as follows. Given F, 'ipi A, an 
application of the first of the two new rules to this sequent and to the standard 
axiom ■02 => 02 yields T, 0i A 02 =1^ ^>02- By applying the second of the two 
new rules to this sequent and to T, 0i Z\ we obtain F, 0i A 02 ^ A. The other 
direction of the equivalence is obvious, and again does not use the cut rule. 

The second separated rule above is similarly split into the following full rules: 

{p2 ^ , Pi ^ } / Pi Ap2 ^ {P2^ , ^Pl}/PlAp2^ 

To conclude, from the first original classical two rules for conjunction, we 
obtain the following equivalent set of four full rules: 

{pi^ , P2 ^ } / Pi A P2 ^ {Pi ^ ^ P 2 } / Pi A P2 ^ 

{ ^ pi, P2 ^ } / Pi Ap2 ^ {^pi, ^P2}/ ^PlAp2 

The general procedure for replacing a non-separated rule R by an equivalent 
set of separated ones is to put first in this set every separated rule which can be 
obtained by selecting exactly one formula from each premise of R (preserving its 
side). We then use Proposition 1 to remove any superfluous or redundant rule. 
The general procedure for splitting a given separated rule R of arity n into an 
equivalent set of full rules is to put in this set every full rule which has the same 
conclusion as i?, and whose set of premises is an extension of that of R (thus if 
R has m < n premises, it will be split into full rules). Using the methods 

employed in the last example, it is easy to see that these transformations preserve 
coherence of a system as well as its set of provable sequents, and that any cut- 
free proof in the resulting system can be simulated by a cut free proof in the 
original one (it is not difficult to directly show that the converse is also true, but 
this will be proved later using a semantic argument). 

Example 7. Suppose we have a canonical rule for a ternary connective: 

{Pl,P 2 , Pi ^ P 2 , P3 ^ P 2 } / o (Pi,P 2 ,Ps) ^ 

The first stage of the above process produces the following rules: 

(1) {Pl^ , P3 ^ } / O (Pl,P2,P3) ^ 

(2), (4) {Pl^ , ^ P 2 } / O (Pl,P2,P3) ^ 

(3) {pi^ , ^ P2 , P3 ^ } / O {Pl,P2,Pz) 

(5) {P2^ , Pi ^ , P3 ^ } / O (Pl,P2,P3) ^ 

(6) {P2 ^ , Pi ^ , ^P2}/ o{pi,P2,Pz)^ 

(7) {P2 ^ ^ P2 , P3 ^ } / O (Pl,P2,P3) ^ 

(8) {P2 ^ , ^P2} / 0(Pl,P2,P3) ^ 

(6), (7), (8) are superfluous and we discard them. (3), (5) are redundant because 
of rules (1),(2), and we discard them as well. The next stage is to extend (1),(2) 
into full rules: 
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{Pl^ , P2^ , P3^} / O {Pl,P2,P3) 

{pi^ , ^P2 , P3^} / O {Pl,P2,P3) ^ 

{pi^ , ^P2, ^Ps}/ o(Pl,P2,P3)^ 

Notation 1 Let G be a canonical calculus. will denote the equivalent full 
calculus obtained by the process described in the proof of Proposition 4. 

We shall later show that if G is coherent then G^ is the unique full canonical 
calculus that is equivalent to G. 

Gorollary 1. If cut elimination obtains for every coherent full canonical calcu- 
lus then it obtains for every coherent canonical calculus. 

Proof. This follows from Proposition 4. 

It remains therefore to show that cut elimination obtains for every coherent 
full canonical calculus. For that, we shall need to use some semantic arguments. 
The corresponding semantics will be described next. 

4 Semantics: Two- Valued Non-deterministic Matrices 

For the semantics of coherent canonical calculi we need some structures which 
generalize the ordinary concept of a multi-valued matrix. The idea behind this 
generalization is to allow non-deterministic computations of truth-values. Thus 
the value that a valuation assigns in these structures to a complex formula is 
not always uniquely determined by the values that it assigns to its subformulas, 
but can be chosen non-deterministically from a certain non-empty set of options. 
The precise definition is as follows: 

Definition 9. [ALOO] A non-deterministic matrix (Nmatrix for short) for C is 
a tuple A4 = (T, O) , where T is a non-empty set of truth values, V is a non- 
empty proper subset ofT (its designated values), and for every n-ary connective 
<> of C, O includes a corresponding n-ary function o from T" to 2^ — {0}. A val- 
uation in M. is a function w : W — >■ T that satisfies the condition: ifo is an n-ary 
connective, and f/'i, ■ • • , V’n G VV, then v{o{ipi , . . . , tpn)) G . . . , v{ipn))- v 

satisfies a formula tf in M (v \=-^ if) if v{if) G T>. v is a model of T in M 
(v \=-^ r) if it satisfies every formula in P . A follows from P in M. (P \-m ^ ) 
if for every model v of P in A4, v |=^ (p for some (f> G A. If \- is an scr then 
AA is called a characteristic Nmatrix for \~ if \~ = hjn . 

Notes: 

1. Every (deterministic) matrix^ can be identified with an Nmatrix whose func- 
tions in O always return singletons. 

See e.g. [Urq86]. 



4 
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2. It is easy to verify that if A4 is an Nmatrix for £ then (£, \-m) is a logic. In 
[ALOO] it is proved that if is finite then this logic is necessarily finitary 
(i.e., the compactness theorem obtains for it). 

In this paper we shall use a special type of Nmatrices: those with exactly two 
truth values (which may be identified with the classical truth values). We shall 
show in fact that there is a strong connection between such Nmatrices and 
coherent canonical Gentzen-type calculi. ^ 

Notation 2 A two-valued Nmatrix in which T = {t, /} and T> = {t} will be 
called a 2-Nmatrix. 



Notation 3 For x £ {t, /}, denote: —x = f if x = t, and —x = t if x = f. 

Notation 4 The expression <?, will denote <P U {a} if x = t and <P if x = f 
(note that <P might be empty here). 

Notation 5 The full canonical rule {p~^' ^ P^*}i<i<n / o {pi, ■ ■ ■ ,Pn) 
where x\, . . . ,x„ G {t, /} will be denoted below by either [o(a;i, . . . , x„) : /] or by 
[o(xi, . . . ,Xn) =^>]. The rule with the same premises but with the complementary 
conclusion will be denoted by [o(a:i, . . . , x„) : t] or by [=J> o{xi, . . . , a;„)]. 



Definition 10. Let G be a coherent canonical calculus. The 2-Nmatrix that is 
defined by G is the following: For each n-ary connective o and every x\, . . . ,Xn G 
{t, /} we define 

/ M if[o{xi,...,Xrf)-.y]isaruleofG^(fory£{t,f}) 
otherwise 

(This is well-defined since we assume that G (and hence G^) is coherent.) 



Example 8. Suppose G has only one rule for the ternary connective o: the one 
given at the beginning of Example 7. Then the three final rules obtained in that 
example determine the following interpretation of o: o(/, /, /) = o(/, t, /) = 
Z{f,t,t) = {/} (and Z{xi,X 2 ,x^) = {t, f} for all other xi,X 2 ,X 3 ). 



Proposition 5. Every coherent canonical calculus G is sound for the 2-Nmatrix 
that it defines. 

® Valuations in two-valued Nmatrices form a special type of what are called bivalua- 
tions in [Bez99]. Another related idea is Meyer’s metavaluations (see e.g. [Dun86]). 
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Proof. By Proposition 4 and Definition 10, we may assume w.l.o.g. that G is 
full. Let Ai be the 2-Nmatrix that is defined by G. We say that a valuation v in 
Ai satisfies a sequent P ^ A \i v{ip) = t for some € A, or v{ip) = f for some 
Ip G r. It is easy to verify that: 

(1) A valuation v satisfies a sequent P, ip~^ Z\, ip^ iff either v satisfies P A, 
or v{ip) = X. 

(2) P hj \4 A iff every valuation v in A4 satisfies P ^ A. 

Consider now an application of the rule [o(a:i, . . . , Xn) '■ y], and assume that v 
is a valuation which satisfies all the premises {Pi,fi~^' Z\i, of this 

application. Then v satisfies also its conclusion, P,fi~^ A,ipy. Indeed, either 
V satisfies Pi Ai for some i, and hence also T =A Z\, or else v{'ipi) = Xi for all 
1 < i < n, and since o{x\, . . . , Xn) = {y}, necessarily v{ip) = y. In both cases (1) 
entails that v satisfies P, A, ip^. It follows by (2) that [o(a;i, . . . , x„) : y] 

is sound for Ai. 



Gorollary 2. A canonical calculus is consistent iff it is coherent. 

Proof. The “only if” part is Proposition 2. The converse follows from Proposition 
5, and the fact that every Nmatrix induces a consistent logic. 

Gorollary 3. The consistency of a canonical calculus is decidable. 

5 Completeness and Cut-Elimination 

Notation 6 For x G {t, /}, denote: ite{x, A, B) = if a; then A else B. Note: 
ite{x, A, B) = ite{—x, B, A). 



Theorem 7. Every coherent full canonical calculus admits cut elimination, and 
it is complete for the 2-Nmatrix that it defines. 

Proof. By Proposition 5, we can prove completeness and cut elimination together 
by showing that if T Z\ does not have a cut-free proof in a coherent full 
canonical calculus G, then P I/_a 4 A, where Ai is the 2-Nmatrix defined by G. 
For this extend first P ^ A to a sequent P* A* with the following properties: 

1. pep* and Z\ C Z\*. 

2. P* ^ A* does not have a cut-free proof in G. 

3. For every rule [o(xi, . . . , a;„) : y] in G, if o{ipi , . . . , ipn) G A* , P*) then 
for some 1 < i <n, fii G ite{xi, A*,P*). 

This extension is possible, because if a sequent P' A' does not have 
a cut-free proof and o{ipi, . . . ,ipn) G ite{y, A' , P') then for some I < i < n, 
P ' does not have a cut-free proof (because otherwise by adding 
an application of [o(xi, . . . ,a;„) : y] to the proofs of these sequents we obtain a 
cut-free proof for P',ip~y A' ,ipy , which is exactly P' A'). 
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The refuting valuation is now defined as follows: 

For atomic q, v{q) = t iS q € F*. 

(t if o(u(V’i), • ■ • = {t} or 

w(o(V'i, . . . , V'n)) = S v{tpn)) = {t, /} and o (■ 01 , . . . , Ipn) G F*] 

[ / otherwise 

V is obviously a legal A^-valuation. We now show by induction on the complexity 
of a formula -tp G F* U A* that \i ip G F* then v{ip) = t, and ii ip G A* then 
v{ip) = f. 

— Assume ip is atomic. If ip G F* then v{ip) = t hy definition. If ip G A* then 
Ip ^ F* hy property 2 of T* Z\*. Hence v{ip) = f. 

— Let Ip = o{ipi, . . . , ipn) and let Xi = v{ipi) for 1 < i < rr. 

Assume ip G F*, but v{ip) = f. According to the definition of v, this can 
happen only if o(a;i, . . . , x„) = {/}. It follows that ip G ite(v(ip), A*, F*) in 
this case. Hence ipi G ite(xi, A*, F*) for some 1 < z < n by property 3 of 
F* A* and the fact that by Definition 10, o(xi, . . . , x„) = {/} in At only 
if [o(a:i, . . . ,Xn) : /] is a rule of G. On the other hand ipi G ite{xi,F* , A*) 
for all 1 < z < n by the induction hypothesis. This contradicts property 2 of 
F* ^ Z\*. 

Now assume ip G A*, but v(ip) = t. According to the definition of v, there 
are two possibilities here: 

1. o(xi, . . . ,Xn) = {0 = {v{ip)}. We get from this a contradiction like in 
the previous case. 

2. o(xi, . . . , Xn) = {t, /} and ip G F*. Since ip G A* as well, this contradicts 
property 2 of F* ^ A* . 

By property 1 of T* A* and what we have just proved, u is a model of F 
in Ai which does not satisfy any element of A. Hence F 1/^ A. 



Theorem 8. A canonical calculus admits cut elimination iff it is coherent. 

Proof. The “only if” part is just Proposition 3. For the “if” part, suppose F ^ A 
has a proof in a coherent canonical calculus G. Let Ai be the 2-Nmatrix that 
is defined by G (as well as G^). Then F hjn A by Proposition 5. Theorem 7 
implies therefore that F A has a cut-free proof in G^. Hence, by Proposition 
4, it also has a cut-free proof in G. 



Theorem 9. Every coherent canonical calculus is sound and complete for the 
2-Nmatrix that it defines. 

Proof. This immediately follows from Theorem 7 and Proposition 5. 



Theorem 10. Let G be a consistent canonical calculus. Then either G defines 
a logic which is a fragment of classical logic, or it has no finite characteristic 
matrix. 
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Proof. Assume that G is consistent. Then it defines a logic L{G). By Corol- 
lary 2 and Theorem 9, L{G) is induced by some 2-Nmatrix S. If S includes 
only deterministic connectives (i.e.: connectives which return singletons for ev- 
ery combination of truth values), then G has a characteristic two- valued matrix, 
and so it is a fragment of classical logic. Otherwise S has at least one proper 
non-deterministic operation, and hence L{G) has no finite characteristic matrix 
by Theorem 6 of [ALOO].® 

We turn now to some more corollaries of the completeness theorem. First, a 
result that was promised at the end of section ??: 

Corollary 4 . If G is a coherent canonical calculus then it has a unique equiv- 
alent full canonical calculus. 

Proof. Let G be a coherent canonical calculus, and let G' be a full canonical 
calculus that is equivalent to G. We need to show that G' = G^. Suppose this 
is not the case. Let and Ai' be the 2-Nmatrices that are defined by G^ 
and G' respectively. Since the two calculi are different then by Definition 10 
there is some n-ary connective o and some Xi,...,Xn G {t, f} such that the 
interpretation of o on cci, . . . , is different in Ai^ and in Ai'. Suppose w.l.o.g. 
that in Ai^ , ^{xi , . . . , x„) is {/}, whereas in Ai' it is either {t} or {t, /}. It is 
easy to see that {pi \ Xi = t}U {o(pi, . . . ,p„)} \~m’^ {Pi I = /}, while this is 
not the case in F^'. Hence, by Theorem 7, G^ and G' are not equivalent. This 
contradicts the fact that they are both equivalent to G. 

Our next results compare strength of rules and introduce a normal form. 

Definition 11 . Let Ri and R2 be two canonical rules (in the same language). 
We say that R\ is at least as strong as R2 if any application of R2 can he simu- 
lated using R\ together with the standard axioms and structural rules (including 
cut). We say that R\ and R2 are equivalent if each of them is at least as strong 
as the other. 

The characterization below of the strength of a rule can be summarized as 
follows: a rule is stronger when its set of premises is weaker! 

Proposition 6 . A canonical rule Si/C is at least as strong as the canonical 
rule S2/C iff every clause in Si classically follows from S2 (this is equivalent to 
saying that every clause in Si is subsumed by some clause that can he derived 
from the clauses of S2 using resolutions). 

Proof. The “if’ part can easily be proved directly. The converse can be shown 
by using 2-Nmatrices. We omit the details. 

Corollary 5 . Two canonical rules Si/C and S2/C are equivalent if Si and S2 
are classically equivalent (as sets of clauses). 



This theorem states that if 5 is a two- valued N-matrix which has at least one proper 
nondeterministic operation, then I -5 has no finite characteristic matrix. 
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This corollary naturally leads to the following economical normal form for 
canonical rules: 

Definition 12. A canonical rule is in Resolution Normal Form (RNF ) if its 
set of premises S does not include a standard axiom, and any resolvent of two 
elements of S is subsumed by some other element of S. 



Corollary 6. Every canonical rule has an equivalent canonical rule in RNF . 
An example of transforming a rule to RNF will be given in the next section. 



6 Calculi for 2-Nmatrices 

In the previous sections we associated with every coherent canonical calculus a 
2-Nmatrix, for which it is sound and complete. In this section we go in the other 
direction, and associate with a given 2-Nmatrix coherent canonical calculi which 
are sound and complete for it. One way of doing so is rather obvious: 

Definition 13. Let A4 be a 2-Nmatrix. The full calculus that is defined by Ai 
is the eanonieal calculus G that has the rule [o{xi, . . . ,Xn) '■ y] for each n-ary 
connective o and for every xi, . . . , y € {t, /} such that o(xi, . . . , x„) = {y}. 

Note: The full calculus that is defined by a 2-Nmatrix is obviously coherent. 

Proposition 7. The full calculus that is defined by a 2-Nmatrix is sound and 
complete for it. 

Proof. The proof is similar to that of Proposition 5 and Theorem 7. 

We introduce now for any given 2-Nmatrix a calculus of a more regular form. 

Theorem 11. Every 2-Nmatrix M. has a sound and complete eoherent canoni- 
cal calculus which for every connective has at most one introduction rule on the 
left, and at most one introduction rule on the right. 

Proof. Let G(AI) be the canonical calculus which for any n-ary connective o 
has the following rules (where o is the interpretation of o in At): 



[o^] 


{{p^ 


H 

II 

T 








iPtl) ^ 


Ko] 


{{p^ 


II 


{pi 


= /}}/go(si,. 







Note that if t G o(xi, . . . , Xn) for all a;i, . . . , then the first rule is superfluous 
and can be discarded, while if o(xi, . . . , x„) = {/} for all xi, . . . ,Xn then that 
rule does not have any premises, i.e. it is a non-standard axiom (this type of 
axioms is permitted in canonical systems!). Similarly, if / G ^(xi , . . . , x„) for all 
xi, . . . ,Xn then the second rule can be discarded, while if o(cci, . . . , x„) = {t} for 
all xi, ... ,Xn then that rule does not have any premises. 
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The soundness of G(A4) is easy to verify. Take for example [o =^]. To show its 
soundness, assume that T, , ■ • ■ , V'n" > ■ • ■ ) V'™ for all xi, . . . , 

such that t € o(xi, . . . , x„), and let w be a model of T U {o{ipi, ■ ■ ■ , ipn)} in -M. 
Then there are j/i, . . . , G {t, /} such that t G o(j/i , . . . ,yn) and vi'tpi) = yi 
for all i. Since T, , • • • , tpn'^ f~M > • • • ? V’n by assumption, it follows 

that u is a model of one of the elements of A. The soundness of [=J> o] is proved 
similarly, while the proof of completeness is similar to that of Theorem 7. 



Corollary 7. Every coherent canonical calculus has an equivalent canonical cal- 
culus in which every connective has at most one introduction rule for each side. 



Example 9. Suppose we have the following interpretation for a binary connective 
o, which makes it a very close relative of the classical conjunction: 

o{t,t) = {t}, ^t,f) = {tJ}, o(/,t) =o(/,/) = {/} 

The corresponding two rules as given in the proof of the last theorem are: 

[o^] {Pl,P2^ ,Pi^P2}/ o{Pi,P2)^ 

Ko] {pi ^ P2 , P2 ^ Pi , ^Pl,P2}/ ^o{pi,P2) 

We next transform these two rules into rules in RNF as follows. Consider the 
set of premises of the second rule. Its closure under cut is: 

{Pl^P2 , P2^Pl , ^PlyP2, ^Pi, ^ P2 y Pi=> Pi , P2^ P 2 } 

We now discard the last two standard axioms, and remove also the original three 
clauses since they are subsumed by p\ and p 2 • A similar process can be 
applied to the first rule. We are left with the simpler rules: 

[O ^]' {Pl^} / O {PlyP2) ^ 

Ko]' {^Ply ^P2}/ ^o(pi,P2) 

Note that both rules are frequently used in the literature as introduction rules 
for classical conjunction. 

Note: Given a 2-Nmatrix A4, the system G(A4) which is constructed in the 
proof of Theorem 11 is a natural basis for a tableaux proof system for validity 
in A4. In fact, an application of a rule in G(Ad) backwards is like a step in a 
tableaux. For example: [o =J>] says that if u(o(i/)i, . . . , V'n)) = t then there are 
some xi, . . . , x„ G {t, /} such that t G 5(xi, . . . , x„). 
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7 Conclusion 

We defined canonical calculi which are the most natural type of multiple con- 
clusion Gentzen-type systems, and showed that such calculi are non-trivial iff 
they satisfy a certain constructive coherence condition. We introduced the se- 
mantics of two- valued non-deterministic matrices (2-Nmatrices) for such calculi, 
and proved that the following are equivalent for any given logic L\ 

1. L is defined by some coherent, canonical Gentzen-type system. 

2. L is defined by some cut-free, canonical Gentzen-type system. 

3. L is the logic of some 2-Nmatrix. 

One of the by-products of our work is a strong evidence for the thesis ac- 
cording to which the meaning of a connective is given by its introduction (and 
“elimination”) rules (in some appropriate deduction system). We have shown 
that at least in the framework of multiple-conclusion consequence relations, any 
reasonable set of canonical introduction rules completely determine the seman- 
tics of a connective. For this it is not even necessary that the left introduction 
rules and the right introduction rules for a given connective precisely “match” 
(in the sense of [Bel62] and [Sun86]). It suffices that there would be no conflict 
between them (where this condition is defined in precise terms). 

Obvious directions for further research are the following: 

1. To extend the ideas and results to first order languages. 

2. To develop an analogous framework and theory for single-conclusion conse- 
quence relations and Natural Deduction systems. 

3. To generalize the framework to arbitrary finite n- valued Nmatrices, possibly 
using sequents with n components like e.g. in [BFZ94]^ (see also the survey 
papers [BFSOO] and [Hah99] for more references and further details). 



References 

[ALOO] Arnon Avron and Iddo Lev, “Non-deterministic matrices,” 2000. Submitted. 

[Avr91] Arnon Avron, “Simple consequence relations,” Information and Computa- 
tion, vol. 92, no. 1, pp. 105-139, 1991. 

[Bel62] Nuel. D. Belnap, “Tonk, plonk and plink,” Analysis, vol. 22, pp. 130-134, 
1962. 

[Bez99] Jean-Yves Beziau, “Classical negation can be expressed by one of its halves,” 
Logic Journal of the IGPL, vol. 7, pp. 145-151, 1999. 

^ This paper introduces a special type of canonical systems for sequents with n com- 
ponents (those that are induced according to a certain procedure by n-valued de- 
terministic matrices), and proves a general cut elimination theorem for this type of 
systems. In the case n = 2 this amounts to a cut elimination theorem for a certain 
class of systems that correspond to fragments of classical logic. 




544 



A. Avron and I. Lev 



[BFSOO] 

[BFZ94] 

[Dun86] 

[Gen69] 

[GG86] 

[Hah99] 

[Hod86] 

[Pri60] 

[Sco74] 

[Sun86] 

[Urq86] 



Matthias Baaz, Ghristian G. Fermiiller, and Gernot Salzer, “Automated 
deduction for many-valued logics,” in Handbook of Automated Reasoning 
(A. Robinson and A. Voronkov, eds.), Elsevier Science Publishers, 2000. 
Matthias Baaz, Christian G. Fermiiller, and Richard Zach, “Elimination of 
cnts in first-oder finite- valued logics,” Information Processing Cybernetics, 
vol. 29, no. 6, pp. 333-355, 1994. 

J. Michael Dunn, “Relevance logic and entailment,” in [GG86], vol. Ill, ch. 3, 
pp. 117-224, 1986. 

Gerhard Gentzen, “Investigations into logical deduction,” in The Collected 
Works of Gerhard Centzen (M. E. Szabo, ed.), pp. 68-131, North Holland, 
Amsterdam, 1969. 

Dov M. Gabbay and Franz Guenthner, Handbook of Philosophical Logic. D. 
Reidel Publishing company, 1986. 

Reiner Hahnle, “Tableaux for multiple-valued logics,” in Handbook of Tableau 
Methods (Marcello D’Agostino, Dov M. Gabbay, Reiner Hahnle, and Joachim 
Posegga, eds.), pp. 529-580, Kluwer Publishing Company, 1999. 

Wilfrid Hodges, “Elementary predicate logic,” in [GG86], vol. I, ch. 1, pp. 1- 
131, 1986. 

A. N. Prior, “The runabout inference ticket,” Analysis, vol. 21, pp. 38-9, 
1960. 

Dana S. Scott, “Completeness and axiomatization in many- valued logics,” 
in Proc. of the Tarski symposium, vol. XXV of Proc. of Symposia in Pure 
Mathematics, (Rhode Island), pp. 411-435, American Mathematical Society, 
1974. 

Goran Sundholm, “Proof theory and meaning,” in 1GG861, vol. HI, ch. 8, 
pp. 471-506, 1986. 

Alasdair Urquhart, “Many- valued logic,” in [GG86], vol. HI, ch. 2, pp. 71- 
116, 1986. 




Incremental Closure of Free Variable Tableaux 



Martin Giese 

Institut fiir Logik, Komplexitat und Deduktionssysteme, 
Universitat Karlsruhe, Germany 
gieseOira . uka . de 



Abstract. This paper presents a technique for automated theorem 
proving with free variable tableaux that does not require backtracking. 
Most existing automated proof procedures using free variable tableaux 
require iterative deepening and backtracking over applied instantiations 
to guarantee completeness. If the correct instantiation is hard to find, 
this can lead to a significant amount of duplicated work. Incremental 
Closure is a way of organizing the search for closing instantiations that 
avoids this inefficiency. 



1 Introduction 

Since the 1980’s, the technique of using free variables to postpone the choice of 
instantiations in the 7 -expansions of tableau calculi for first-order logic is used in 
practically all implementations. These free variables have to be instantiated at 
some point in the proof search by unifying complementary literals on branches, 
and one faces the problem that doing this in a naive way can lead to non- 
termination for some unsatisfiable sets of formulae, and thus to incompleteness 
of the procedure. 

The most used way of regaining completeness employs backtracking and iter- 
ative deepening: A complexity limit for the proof is fixed, and a proof that does 
not exceed this complexity is sought for, using backtracking to explore the search 
space. If no proof is found, the limit is increased. Unfortunately, backtracking 
can lead to a large amount of duplicated work, because the prover forgets infor- 
mation which it might need again. On the other hand, the analytic free variable 
tableau calculus is proof confluent, meaning that any open tableaux for an un- 
satisfiable set of formulae may be closed by further expansion. This means that 
the calculus does not require backtracking, contrary to connection tableaux, for 
instance. 

This is probably the main incentive to consider proof procedures that can do 
without backtracking. Another reason is that they are more suited for use in an 
integrated automated and interactive system: The user has more possibility of 
seeing what went wrong in a failed proof attempt, if all information about what 
has been tried so far is kept. 

Lately, a number of tableau-based procedures has been proposed that work 
without backtracking. Most of these concentrate on overcoming the mentioned 
naivete of simply closing a branch as soon as possible. In [Bil96] for instance. 
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instead of instantiating free variables in the tableau, the set of input clauses 
is extended by instantiated variants, leading to a kind of saturation process. A 
similar approach based on the connection calculus is presented in [BEF99]. In 
[BecOO] , an ordering restriction on the sequence of generated tableaux is imposed 
that forbids cycles. 

This paper describes an approach in which the free variables are never in- 
stantiated in the tableau, but the various possibilities are effectively considered 
in parallel. We use an incremental approach to compute an instantiation of the 
free variables that closes all branches simultaneously, hence the name Incremen- 
tal Closure} 

After defining a few basic notions, we shall present the basic idea of the 
approach in Sect. 3. We describe a number of possible refinements in Sect. 4, 
and some experimental results are quoted in Sect. 5. 

2 Preliminaries 

We assume a fixed first-order signature throughout this paper. Let terms and 
first-order formulae (without equality) over that signature be defined in the usual 
way. A ground term is a term that does not contain variables. 

A formula is in negation normal form (NNF), iff negation signs appear only 
in front of atomic formulae p{ti, . . . , t„). By the application of de Morgan’s rules, 
any formula can be transformed into an equivalent NNF formula. A formula is in 
skolemized negation normal form (SNNF), iff it is in NNF and does not contain 
existential quantifiers. Any formula F can be transformed by skolemization into 
a formula F' in SNNF that is satisfiable iff F is satisfiable. A formula is closed 
if all variable occurrences in it are bound by a quantifier. 

Definition 1. An instantiation is a mapping from the set of all variables to 
ground terms. Let Sub° denote the set of all instantiations. 

This differs from the usual concept of a ground substitution, in that we require 
all, i.e. infinitely many variables to be mapped. 

Definition 2. A goal is a finite set of formulae. A tableau is a finite tree where 
every node has zero, one, or two children, and each node is labeled with a goal. 
A leaf is a node with no children. The leaf goals of a tableau are the goals that 
label its leaves. 

A tableau for a finite set of SNNF formulae S is defined inductively as follows: 

1. The tableau consisting of the root node labeled with the goal S is a tableau 
for S, called the initial tableau. 

2. If there is a tableau for S that has a leaf n with goal {oi A 02 } U G, then the 
tableau obtained by adding a new child n' with goal {ai, 02 } U G to n is also 
a tableau for S. (ex- expansion) 

^ A predecessor to this approach was sketched in the Postition Paper [GieOOb] under 
the name of ‘Instance Streams’. 
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3. If there is a tableau for S that has a leaf n with goal {(3\ V 132} U G, then the 
tableau obtained by adding two new children n! , resp. n” with goals {/3i}UG, 
resp. {( 32 } yjG to n is also a tableau for S. (fd-expansion) 

4- If there is a tableau for S that has a leaf n with goal {Vx.71} U G, then the 
tableau obtained by adding a new child n' with goal {[x / x .^ 1 } U G to 
n, where X did not previously occur in the tableau, is also a tableau for S. 
(j- expansion) 

A complementary pair is a pair 4>, -'if, where (j) and if are unifiable atomic 
formulae. A goal G is closed under an instantiation a, iff there is a comple- 
mentary pair {4>,->il)} C G with a{4>) = o{ili). A tableau T is closed under an 
instantiation a, iff each leaf goal ofT is closed under a. A tableau is closable iff 
it is closed under some instantiation. 

We use this somewhat unusual formulation of tableaux labeled with sets 
of formulae (Smullyan [Smu68] calls them block tableaux) because it helps in 
describing the procedure. Note that in an implementation, it is sufficient to keep 
the leaf goals in memory; they correspond to the branches in the usual definition. 

Another deviation from the usual formulations of free variable tableau cal- 
culi in that there is no rule that instantiates the free variables introduced by 
a 7-rule. Instead, an instantiation that closes all branches simultaneously has 
to be found, to decide that a tableau is closable. This is an important aspect 
of the incremental closure technique. It is obvious, that the usual correctness 
and completeness proofs for free variable tableaux are also applicable to this 
formulation. 

Proposition 1. Let S be a set of closed formulae in SNNF. S is unsatisfiable 
iff there is a closable tableau for S. 



3 Incremental Closure 

From Prop. 1, it is easy to derive a complete proof procedure: 

T := initial tableau for S 
while ( not closable(T) ) do 
if expandable(T) then 
select possible expansion of T 
expand T 
else 

answer 'satisfiable' 

end 

end 

answer ’unsatisfiable’ 

This is a complete proof procedure, provided the selection of tableau expan- 
sions is fair. Being fair means that if the procedure does not terminate, any 
extension step possible on a goal will at some point be applied on that goal or 
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one of its descendants. In particular, in a non-terminating run, infinitely many 
instances of each 7 -formula will ultimately be produced on each branch. 

The main problem with this proof procedure is the test closable(T): In gen- 
eral, the right combination of complementary literals has to be found in the leaf 
goals to compute a simultaneous unifier. This is NP-complete in the size of the 
leaf goals. ^ 

The problem of finding the right complementary pairs has to be solved in 
any free variable tableau proof procedure, backtracking or not. But although the 
worst-case complexity cannot be reduced, a speedup can be achieved by tuning 
the procedure to perform well in practical cases. 

The approach presented here makes the procedure more efficient by comput- 
ing closable(T) in an incremental fashion, based on the following observations: 

— If a pair of complementary literals is unifiable, it will stay unifiable after any 
extension. This should make an incremental algorithm worthwhile. 

— An instantiation has to be found for the free variables introduced by the 7 - 
rule. These only occur in the proof tree below the corresponding 7 formula. 
To take advantage of this locality, the algorithm should exploit the structure 
of the proof tree. 



3.1 An Abstract Description 

In this section we shall abstract away from concrete representations of instan- 
tiations, and assume that we can perform calculations on (potentially infinite) 
sets of instantiations. How to represent these in an actual implementation is 
discussed in Sect. 3.2. 

Let 

unif(((), t/>) := {cr G Sub° | a{4>) = o’(V')} 
be the set of instantiations that unify two atomic formulae. We define 

cl(G) := U unif ((/>,'(/') 

to be the set of instantiations under which a goal G is closed. For a node n of 
a tableau, let lg{n) be the set of leaf goals associated with the leaves that are 
descendants of n. Use this to define 

cl(n):= f| cl(G) 

Gelg{n) 

^ Unifiability can be decided in linear time [PW78], so with indeterministic selection of 
complementary pairs, closable(T) is in NP. On the other hand, SAT for propositional 
clauses can be reduced to this problem by translating each clause to one leaf goal, 
mapping propositional symbols to free variables, such that a goal is closable under an 
instantiation to {0, 1} iff the clause is satisfied by the corresponding interpretation. 
E.g., translate Av to {pa(0), -ipa(1),Ps(0), ~ips(1),pa(A), -.ps(B)}. 
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to be the set of instantiations under which all leaves below n are closed. Obvi- 
ously, cl(root), where root is the root node, is the set of instantiations that close 
the whole tableau. 

We can take advantage of the tableau structure by expressing cl(n) recur- 
sively: If a node n has only one child n' , cl(n) = cl(n'), for two children n' ,n", 
cl(n) = cl(n') n cl(n"). For a leaf n labeled with goal G, cl(n) = cl(G). 

We shall clarify these notions using the following tableau: 



ni : \/x.{qx\/ -ipx),\/y.qy,-iqb,pa 

I 

ri2 ■■ qX V -^pXyy.qy,^qb,pa,yx .{. . .) 



ns - qX,\/y.qy,^qb,pa,\/x.{. . .) 



ri4 : -^pX,\/y.qy,^qb,pa,\/x.{. . .) 



ri2 was constructed by applying a 7-expansion at ni, and ns, were introduced 
by a / 3 -expansion at U2- The newly introduced formulae are underlined in each 
goal. The goal at ns contains one complementary pair qX, ->qb. So cl(n3) = 
umf{qX,qb) = {a G Sub°|cr(X) = b}, the set of instantiations that map X to 
b. Similarly, cl(n4) = {a G Sub°|(j(X) = a}, because of the complementary pair 
~'pX,pa. For cl(n2) we have to find instantiations that close both leaf goals, 
cl(n2) = cl(n3) ncl(ri4). There are obviously no such instantiations, cl(n2) = 0 . 
The same holds for the root ni, of course. 

To get an incremental algorithm, we shall examine the values of cl change 
when a tableau expansion produces new complementary pairs. In general, one 
expansion step might lead to several new complementary pairs in one goal, or 
there might be two new goals, each of which can contain new complementary 
pairs. We shall examine the changes to cl induced by one new complementary 
pair (j), at one leaf /, called the selected leaf. If there are several new comple- 
mentary pairs, these changes may be applied consecutively for each of them. 

Let do denote the value of cl before taking into account 4 >, while cl 
is the updated value. Possible closing instantiations are never destroyed by an 
expansion step, so the sets cl can only grow when the tableau is expanded, 
i.e. cl(n) 3 do(n) for all nodes of the tableau. Define 5 {n) := cl(n) \ do(n) to 
be the set of new closing instantiations. Obviously, cl(n) = do(n), so S{n) = 0 , 
if the selected leaf I is not a descendant of n. In other words, 6 {n) is non-empty 
only for nodes n on the path between I and the root of the tableau. For the 
selected leaf I, S is given by 

<5(0 = unif(^,'i/') \clo(n) . 

Using the recursive expression for cl(n), we can ‘propagate’ this change up 
the branch towards the root. We obtain S{n) = S(n') for all nodes n with one 
child n' . For a node n with two children n' and n" , we assume that I lies below 
n'. This implies that cl(n") = clo(n"), so we have 
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S{n) = cl(n) \ do(n) 

= (cl(n') n cl(n")) \ (clo(n') n clo(n")) 

= (cl(n') n clo(n")) \ (clo(n') n do(n")) 

= (d(n') \do(n')) ndo(n") 

= 6{n') n do(n") 

The case where I lies below n” is of course symmetrical. 

The central idea of the incremental closure procedure is to keep track of the 
sets cl(n) and update them by propagating the additional closures S{n) up the 
branch using this equation. As soon as 6 (root) ^ 0 , the tableau must be closable. 

We shall continue the example from above to demonstrate the propagation 
of (5 values. 

ni : \/x.{qx V -<px),\/y.qy,-iqb,pa 
I 

ri2 : qXV ^pX,Vy.qy,^qb,pa,\/x.{...) 




713 : q2^yy-qy,^qb,pa,yx.{. . .) m: ^pX yy.qy,^qb,payx.{. . .) 

I 

715 : qY,qX,^qb,pa,yx.{. . .),'iy.qy 

There is a new node ns stemming from a y-expansion at 713. This leads to 
the new complementary pair qY, -<qb. Not taking this into account leads to: 
do(n3) = do(ns) = {a € Sub°|CT(A) = b}, clo(n4) = {a € Sub°|cr(A) = a}, and 
do(ni) = clo(n2) = 0 . These are the values we derived for cl on the previous 
page. Now, including qY,->qb, we get 

<5(^5) = unif((3'y, qb) \ do(ns) = {a G Sub°|(j(F) = b and ct(A) 6 } 

This allows us to calculate 5 for all nodes between ns and the root: After 6(713) = 
6(715), we have 

6(712) = 6(713) ndo(n4) = {a G Sub°|CT(y) = b and (t(X) = a} 

This in turn leads to 6(tii) = 6(712) yf 0 , so the proof is closable, namely by any 
instantiation that maps A to a and Y to b. 

Still assuming we could calculate with infinite sets of instantiations, we shall 
now show how the computation and propagation of the 6 values is organized. 
The procedure shall be described in a state-based way, but it turns out that 
operations on the state will typically be local. For that reason, we shall take an 
object oriented view. 

Every leaf goal has an associated Sink object. A sink is an object capable of 
receiving a set of instantiations and performing some computation on it. This 
is realized by giving a put method to the Sink objects that takes a set of in- 
stantiations as parameter. The proof procedure will call this method after every 
expansion step with any set 6(71) of new closing instantiations coming from a 
new complementary pair i.e. 
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goal. sink. put(unif(0, f/j) \do(n)) 

We shall see further down how clo(n) is extracted from the data structures. 

There are two kinds of objects that act as sinks. One is the RootSink, which 
will receive S(root). This contains a flag closable that records whether a non- 
empty set of instantiations has yet been received: 

RootSink::put(S) is 
if S nonempty then 

closable true 
end 
end 

The other kind of sink is provided by Merger objects which correspond to 
the splits in the tableau and are responsible for calculating the intersections 
6(n) = S{n') ndo(n"). 

The structure of a Merger is shown in the following diagram: 



/ 



S(n) 


out 


Merger 

other 


/ left \ 


/ right \ 


/ MergerSink \ 


/ MergerSink \ 


/© \ 


/© \ 


t 


t 


S(n') 1 


5(n") 1 



It consists of two MergerSink objects, one to receive S{n') and one for S{n”). The 
current set cl(n'), resp. cl(n") is stored in a buffer B in the corresponding input 
sink. Furthermore there is a reference out to an output sink, to which 6{n) will 
be passed on. The two sinks are mutually connected by an association other, so 
they can access each others buffers via other. B. Accordingly, the put method of 
the MergerSink object works as follows: 

MergerSink::put(S) is 

J S n other. B // S{n) = S(n') ndo(n") 

B := B U // cl(n) = do(n) fl S(n) 

out.put(J) 

end 

It only remains to see how do(n) can be computed to determine S(n) = 
unif(0, 'i/j)\clo(n) for a new closure. There are two cases: If the goal is associated 
with the RootSink, do(n) must be empty, because the proof would otherwise 
be closed already. If it is associated with a MergerSink, this sink contains the 
current value of do(n) in its buffer B. 
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The proof procedure is now changed as follows: 

T initial tableau for S 
associate RootSink r with goal of T 
while ( not r.closable ) do 
if expandable(T) then 
select possible expansion of T 
expand T 

possibly generate new Merger 
handle new complementary pair 
else 

answer 'satisfiable' 

end 

end 

answer ’unsatisfiable’ 

At the initialization, a RootSink object is associated with the single goal of 
the tableau. 

In the case of a /3-expansion, i.e. a new split in the tableau, the step ‘possibly 
generate new Merger’ creates a new Merger object. The buffers B are initialized 
with the current value of cIq of the parent node. The output of the merger object 
is sent to the sink s of the parent node, and the new child nodes are associated 
with the input sinks of the merger, as shown in the following diagram: 




After the sinks have been updated, the procedure checks for new complemen- 
tary pairs introduced by the expansion, calculates S(ri) = unif((/), 7//) \clo(n) for 
each of them, and sends S{n) into the associated sink of the goal. 

After all new closing instantiations have been passed to the sinks, the tableau 
is closable, if the closable flag of the root sink has been set. 



3.2 Representation of Instantiation Sets 

We have so far assumed that we can compute with infinite sets of instantia- 
tions. To get closer to a concrete implementation, we have to show how these 
may be represented with finite data structures. We shall briefly describe the 
representations used in the prototypical prover PrInS. (Prover with Instance 
Streams — referring to the streams of closing instantiations passed between the 
Sink objects.) 
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We use syntactic equality constraints to denote sets of instantiations: These 
are first-order formulae with equality as only predicate symbol, which are inter- 
preted over the free term algebra. A constraint represents the set of instantia- 
tions that satisfy it. E.g. unif(p(A, b),p{a, V)) yields a constraint X = a&iY = b, 
that is satisfied by all instantiations that map A to a and Y to b. As usual for 
unification, these constraints are kept in a ‘solved form’, that makes it easy to 
determine their satisfiability. The intersection of sets of instantiations required 
in the Mergers corresponds to the conjunction of constraints. In the updates of 
the MergerSinks’ buffers B, set union is required which could be represented as 
disjunction of constraints. There is however no need to handle arbitrary dis- 
junctive constraints: The buffers B can be implemented as lists of conjunctive 
constraints. The put method then looks as follows: 

MergerSink::put(C) is 
foreach D in other. B do 
J := C & D 
if J satisfiable then 
out.put(J) 
end 
end 

add C to B 

end 

The constraints passed into the put methods are then purely conjunctive. 
Finally, the set difference operation can be modeled either by taking negation 
into the constraint language, or by introducing subsumption checks at various 
places. We will not elaborate this here. See e.g. [Com91] for a survey on syntactic 
constraint solving methods. 



4 Refinements 

The Incremental Closure approach has the desirable property that it can eas- 
ily be refined in a number of ways. We stress this point, because incremental 
closure is surely not the answer to all problems in automated theorem proving. 
It is therefore important to see how this new technique can be combined with 
successful existing approaches. 

This section presents a number of possible refinements. While some of them 
are particular to the incremental closure technique, many are adaptations of 
refinements known from backtracking procedures. 



4.1 Restriction of Instantiation Domains 

On the abstract level, instead of passing instantiations for all free variables 
through the sink structure, it is possible to define the method to work with 
instantiations of only the free variables actually present at a certain tableau 
node. 
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For instance, in the following tableau:^ 



Vm, v.p{X, u, v) V q{u, V, Y) 
p{X,U,V)Vq{U,V,Y) 

p{X,U,V)'^^ '"'^q{U,V,Y) 

assume that the left branch may be closed for instantiations satisfying X = U, 
and the right branch for U = Y. The Merger corresponding to the split will find 
that both branches are closable with X = U k.U = Y . But U does not exist in the 
tableau above the 7 -expansion, so this sub-tableau can be considered closable 
for all instantiations satisfying X = Y. 

In terms of constraints, the restriction to a subset of occurring variables 
corresponds to existential quantification: X = T is equivalent to 3U, V.X = 
U EzU = Y . This variation may be implemented by introducing a new kind of 
Sink object at every 7 -expansion that computes the domain restriction. 

This modification has several advantages: 

— As in the example above, the existentially quantified constraints can often 
be simplified. Thus, they consume less space in the buffers B. 

— Assume that a new combination of complementary literals leads to A = 
V = y in the example. This would have to be handled separately in 
the original version, but domain restriction leads to A = F as above. A 
subsumption check can be used to avoid further redundant computations. 



4.2 Delete Propositionally Closable Branches 

It occasionally happens that a sub-tableau is closable under any instantiation. 
This is the case in proofs of propositional formulae, where no free variables 
are required at all, but it can also happen with first-order formulae if there is 
a complementary pair that is unifiable without any further instantiation. It is 
useless to expand that part of the tableau any further, because no more closing 
instantiations can be found. The sub-tableau is called propositionally closable. 

In the implementation using constraints, this corresponds to a constraint 
(equivalent to) true being passed through the sink structure. If this is detected, 
the corresponding goals and the Sink structure may be deleted to reduce memory 
consumption. 

4.3 Using Buffers for Goal Selection 

So far, the Sink structure built during a proof has only been used to check 
whether the tableau is closable. It turns out that it can also be useful for goal 
selection, i.e. deciding on which goal the next expansion step should take place. 

® We shall adopt a more familiar and compact notation for tableaux here and in the 
sequel, by writing only the newly introduced formulae of each goal. 
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The buffers B contain representations of the closing instantiation for sub- 
tableaux. If, for instance, one subtree of a node has no closing instantiation yet, 
while the other does, expansions should take place in the first subtree, until at 
least one closing instantiation has also been found there. It is also possible to 
use the size of the buffers or of the constraints they contain for heuristics that 
tend to expand branches that seem harder to close. 

4.4 Pruning 

Pruning (see e.g. [BH98,BFN96]) is an important technique known from back- 
tracking procedures that can reduce the search space dramatically: The prover 
keeps track of the ancestry of formulae, i.e. the set of formulae in the tableau 
which were used to derive it."^ If a branch is closed by unifying a particular 
complementary pair <j), -iip, the prover examines the /3-expansions that occurred 
earlier on the branch. If for a particular expansion, neither the ancestry of (j>, 
nor that of contains the sub-formula /3i/2 introduced by the expansion, then 
the closure would have been possible without that expansion. Consequently, the 
expansion can be removed a posteriori, saving the work of closing the other 
branch introduced by it. Of course, the decision for that particular complemen- 
tary pair might be revised in a backtracking step, and then the expansion has 
to be reintroduced. 

We will now see how the pruning technique can be adapted to the incremental 
closure approach. We record for each formula an ancestry of /3-expansions on 
which it depends. We can use references to the Merger objects for this, as there is 
exactly one of these for each /3-expansion. In the abstract view of the procedure, 
we now compute with sets of pairs {a, h) of instantiations with ancestries, instead 
of just sets of instantiations. We have to redefine the operations unif, fl, U and 
\ to work with sets of such pairs. In particular, the fl operation in the Merger 
has to combine the histories of instantiations. 

The ‘pruning’ takes place, when a Merger m receives an instantiation that 
does not have m in its ancestry: it can pass such an instantiation to the output 
sink independently of the contents of the buffer of the other branch: 
MergerSink::put(S) is 

P := {{a, /i) G S I this Merger ^ h} 
out.put(P) 

S := S \ P; 

J := S n other. B 
B := B U S 
out.put(J) 
end 

As the complementary pair that led to this closing instantiation might not be 
the one that is ultimately needed to close the proof, the other branch may not, in 
general, be deleted. But the gain of passing a closing instantiation further up the 
Sink structure turns out to be very important in practice. Of course, in the case 

Actually, it suffices to record only the /3-subformulae introduced by //-expansions. 
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of a propositional closure, it is even possible to delete the whole sub-tableau, 
giving an even greater advantage. 

4.5 Constraints 

A wide range of refinements and variations of the incremental closure method 
becomes possible if the prover is modified to work with constrained formulae. A 
constrained formula is a pair (/) <C C of a formula <f and a constraint C. The 
meaning of this is that </> may be used to close a branch only if the instantiation 
of the free variables of the tableau satisfies the constraint. Constrained formulae 
may be used to port tableau rules that normally require an instantiation to the 
incremental closure method, and also for restrictions of the search procedure 
that limit the permissible instantiations in some way. 



Rules introducing constraints. In [GieOOa], a simplification rule using con- 
straints was presented. Consider for instance a tableau branch containing the 
formulae 

(1) : yy.iqiy) Ap{X)) 

(2) :p{a) 

In a backtracking framework, (I) could be simplified with (2), by instantiating X 
with a, then replacing the occurrence of p{X) by true, and finally rewriting the 
formula to ^y.q{y). The original, unsimplified formula (1) could be discarded. 
It would however be necessary to backtrack over the instantiation of X. Using 
constraints, one would not perform the instantiation explicitly; instead one would 
derive the constrained formula 

(3) : 'iy.q{y) < A = a 

and replace the original formula by 

(4) : \/y.{q{y) A p{X)) < A ^ a , 

using the constraint to keep track of the fact, that for instantiations with X = a, 
formula (3) should be used instead of (4). 

This approach blends perfectly with the incremental closure method. Only 
one change is needed: When a new complementary pair (/) <C C, ~<'ip <C I? is found, 
the constraints of the formulae have to be added to the unification constraint. 
One defines: 



unif((/) <C C, ^ <C £)) := unif((/), if) kC k D . 

The same approach can be used to build an incremental closure version of 
hyper tableaux ([Bau98]) with (rigid) free variables: For a hyper tableau rule 

P{a) -A q{y) V r{y) 

and a literal p{X), one can produce an expanded tableau 
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^p{X)^ 

q{Y) < X = a r{Y) X = a 
where V is a new free variable. 

Another use for constrained formulae is equality handling: In [NROl], Sect. 5, 
Nieuwenhuis and Rubio point out the importance of using ordering constraints 
to reduce the search space in automated equality reasoning, and [Bec94] presents 
a constraint based method for equality handling in tableaux that can be neatly 
integrated with the incremental closure approach. 

To use ordering constraints, one simply has to extend the constraint language 
to contain an ordering predicate ^ in addition to the syntactic equality =. The 
semantics of constraints is given by fixing the interpretation of this predicate to 
some suitable reduction ordering. 



Restrictions introducing constraints. Constraints can also be introduced 
to adapt certain proof search restrictions from backtracking procedures, in a 
similar way to what is described in [LSOl], Sect. 8. 

One example is regularity. In its simplest form, the regularity condition re- 
quires that no rule is applied that introduces a formula on a branch that already 
occurs on it. While this is easy to enforce for the (purely academic) case of ground 
tableaux, it requires a certain effort when free variables are used, because two 
formulae might become equal through an instantiation. 

In the incremental closure framework, constraints can be used to ensure reg- 
ularity. A goal containing p{a) and p{X) V q{X) might be expanded thus: 

p{a) 

^{X) V (7(A)^ 

p{X) <C A ^ a q{X) <C A ^ a 

Then, if the instantiation X = a ultimately does lead to a proof of, say, the 
left branch, the ancestry of that instantiation cannot contain this /3-split, so the 
pruning mechanism will take care that the redundant splitting expansion does 
no harm. 

5 Experimental Results 

In this section we shall quote some results obtained with an experimental im- 
plementation of the iterative closure procedure. 

To cleanly separate the effects of incremental closure from those of various 
refinements, the technique was tested with a very simple implementation: No re- 
finements like pruning or simplification were employed. Only the goal selection 
strategy from Sect. 4.3 was used. Formulae in goals are kept in a list and the 
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first formula in this list is used for expansion. On the other hand, an equally 
simple backtracking prover was implemented using the same data-structures. 
Iterative deepening was applied on the number of 7 -expansions per branch. 
The proof search of this backtracking prover is practically identical to that of 
leanT4P[BP94]. 

Comparing the two provers on some simple probelms (harder problems re- 
quire refinements in both cases) shows that the incremental closure prover is 
nearly always faster. The difference is particularly noticeable in cases that re- 
quire heavy backtracking, i.e. where many complementary pairs are found that 
do not lead to a proof. This is the case, e.g. in SYN054-I-1 from the TPTP library, 
where the backtracking procedure requires 14317 rule applications and 37817 
unifications versus 151 rule applications and 680 unifications with incremental 
closure. The problem family p{c) A A Vx.(p(x) — >■ p{f{x))) also shows 

this phenomenon very clearly, because there are many possible closures, and 
only few of them are correct for a low depth limit. For n = 15, the backtracking 
prover performs over 300,000 rule applications and over 1,000,000 unifications, 
while the incremental closure procedure requires 32 rule applications and 133 
unifications. 

The current implementation incorporates most of the refinements given in 
Sect. 4. It proves 161 of the 237 full-first-order theorems without equality in 
TPTP v.2.3.0, one of which, SYN067-I-1, is rated 0.67. 

One would expect the described procedure to give rise to memory problems. 
But with resource limits of 300 s CPU and 160 MB heap, only about 30% of the 
failures were due to lack of memory. We hope to further reduce this amount by 
implementing more powerful rules and a better goal selection strategy, leading 
to shorter proofs. The idea is that without backtracking, one can afford to spend 
more time on individual rule applications, as these do not need to be repeated. 



6 Conclusion, Related Work, and Future Research 

We have presented an approach to eliminate backtracking from a proof procedure 
for free variable tableaux. It is built around the idea of incrementally computing 
instantiations that close sub-tableaux until one global instantiation is found 
that closes the whole tableau. We have demonstrated that this technique allows 
various refinements to be incorporated in the calculus and the procedure. Finally 
we have given some experimental results obtained by comparing the approach 
with a backtracking solution. 

A similar approach has independently been described by B. Konev and T. Je- 
belean in [KJOO]. However, they hardly consider possibilies for refinements. 

Further work includes experiments with integrated equality handling, search 
for specialized data structures, e.g. for the buffers B, better goal selection strate- 
gies and adaptations of further refinements from backtracking procedures. It 
might also be interesting to experiment with a parallelized implementation. 
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Abstract. We present a polynomial translation of dag sequent proofs 
into tree sequent proofs for first-order classical and intuitionistic logic. 
The basic idea is to interpret a reference in a dag proof as a lemma 
application, which is then simulated using an application of the cut rule. 
The result of this translation is a tree proof with cuts, which are only 
applied in order to “factorize” identical subproofs. We illustrate a central 
application of the presented cut-based translation, that is automated 
extraction of modular programs from first-order intuitionistic proofs. 



1 Introduction 

Sequent calculi are the common proof systems in interactive proof assistants, 
and are widely used to present machine found proofs in a way comprehensi- 
ble by humans. Integrating an efficient automated deduction system for proving 
suitable subproblems, e.g., first-order intuitionistic formulae, into these proof as- 
sistants is a non-trivial task, because their underlying calculi are not suited for 
automated proof search. Automated deduction systems are typically based on 
redundancy-reduced calculi, like various kinds of connection and tableau calculi, 
special forms of (prefixed) sequent systems, or resolution. Consequently, auto- 
matically generated proofs have to be translated back into the underlying calculi 
of the proof assistants. 

Automated proof search in sequent systems, however, can be performed 
in two directions: either from the end sequent towards the axioms (backward 
search), or from the axioms towards the end sequent (forward search). Although 
a naive implementation of forward proof search is hardly efficient, the picture 
changes if we respect the subformula property: Only such inferences are applica- 
ble which introduce a subformula of the formula to be proven into the sequent. 
Forward search in sequent calculi is often called inverse because it works in 
the inverse direction compared to the more usual backward search procedure. 
Many “inverse calculi” for different logics have been developed in the past by 
Maslov, Mints, Voronkov, Tammet and others (see [17] or [6] for further details 
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and references). These calculi are well-suited for proof search because derived 
sequents can be used more than once as premises and thus, allow compact proof 
representations . 

The proofs resulting from inverse proof calculi are essentially sequent proofs 
in dag form (dag = directed acyclic graph) . In a dag sequent proof, every sequent 
that occurs identically multiple times is only proven once. The remaining non- 
proven occurrences of the sequent receive a reference to this proof in a way, such 
that the proof tree combined with all the references induces a directed acyclic 
graph. From this viewpoint, dag sequent proofs are closely related to the proof 
calculi underlying the interactive proof systems. However, these calculi are often 
based on a tree format for sequent proofs, which prevents a direct interpretation 
of the dag proofs by these systems. In order to integrate forward-directed proof 
search procedures into interactive proof assistants, one has to transform the 
resulting dag sequent proofs into tree sequent proofs. 

In this paper, we present a polynomial-time computable translation of dag 
sequent proofs (even with cuts) into tree sequent proofs for first-order classical 
and intuitionistic logic. For this, we develop a general procedure which works 
uniformly for proofs in Gentzen-like sequent calculi £/Cand LJ [12] as well 
as in a multiple-conclusioned intuitionistic sequent calculus CJrnc according to 
Fitting [11]. Our transformation is based on a simulation of dag proofs within 
the corresponding (tree) sequent calculi plus the cut rule. Each reference in a 
dag proof is interpreted as a lemma application, where the proof of the lemma 
occurs at the identical sequent the reference points to. The simulation can be 
described as follows: The sequent at the reference is encoded into a first-order 
formula, the formula image of the sequent, and introduced with an application 
of the cut rule. The lemma proof from the dag proof is used to prove the formula 
image. At the original references in the dag proof, we unfold the formula image 
of the identical sequent and thus, prove each reference with a short tree proof. 
The repeated process of cut introduction, unfolding formula images, and adding 
short proofs for references in the dag proof eventually leads to a tree proof. No 
additional search is involved, since the references in the dag proof indicate which 
sequents are subject to cut introduction. Moreover, the size of the resulting tree 
proof increases at most polynomially with respect to the size of the original dag 
proof. 

We illustrate a main application of the presented cut-based translation, 
i.e. automated extraction of modular programs from intuitionistic sequent proofs. 
For this, we use the fact that the program extracted from an application of the 
cut rule can be interpreted as a procedure call. In our transformation, a cut is in- 
troduced to “factorize” subproofs of identical sequents in the original dag proof. 
If we regard the program extracted from this subproof as a procedure, its factor- 
ization using the cut rule realizes a (main) program which calls the procedure. 
When supporting interactive proof assistants with automated theorem provers 
for first-order intuitionistic logic (e.g., [19]), the design of the resulting programs 
strongly depends on the way the machine-found proofs are translated into the 
proof calculi of the systems. Standard cut-based transformations [15,4,18] on the 
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one hand result in “unintuitive” programs with respect to the original specifica- 
tion, i.e. the formula to be proved valid. This is due to the fact that the cut rule 
is applied locally in order to overcome certain divergences between proof calculi. 
Permutation-based translations on the other hand may cause extensive dupli- 
cations of subproofs [9,10], which means that the extracted programs contain 
identical code fragments multiple times. In contrast to that, our transformation 
produces “intuitive” programs, i.e. modular programs without duplications of 
identical code fragments, since the cut-rule is used as a procedural programming 
concept. 

Section 2 introduces basic concepts and proof calculi we are using. In Sec- 
tion 3, we present and prove the polynomial transformation from dag sequent 
proofs to tree proofs with cuts. We show how our translation realizes the con- 
struction of procedural programs from proofs (Section 4). The conclusion con- 
tains a discussion of related work and some remarks on extensions of the trans- 
lation method to other non-classical logics. 

2 Definition and Notations 

Throughout this paper, we use a first-order language consisting of variables, 
constants, function symbols, predicate symbols, logical connectives, quantifiers 
and punctuation symbols. Terms and formulae are defined according to the usual 
formation rules. For a first-order formula F, let FV{F) denote the set of all free 
variables in F. 



Sequent calculi. A sequent S is an ordered pair of the form F \~ A, 
where F, A are finite multisets of first-order formulae. F is the antecedent 
of S and A is the succedent of S. The semantical meaning of a sequent 
T = Ai, . . . , An F Bi, . . . , Bm is the same as the semantical meaning of the 
formula Ft = (Ar=i (Vl^i The set of free variables for a sequent T 

is defined as FV{T) = FV{Ft). 

The formula image i.(T) of a sequent T is defined as l{T) = Vci . . . Vc^. Ft, 
where FV{T) = {ci, . . . , c^}. Sometimes, we use the schema — >■ to 

abbreviate t{T), where Ff; = Ar=i = Vl^i 

As proof systems, we consider the cut-free classical sequent calculus LJC as 
well as the two intuitionistic sequent calculi CJmc and CJ. The calculus CJC is 
depicted on the left hand side in Fig. 1. We consider negation as defined, e.g., 
-■A is A — >■ T, and use an additional F-axiom. 

Structural rules such as contraction and weakening are not necessary for 
completeness of the above calculi, i.e. we use implicit contraction in the rules 
V/, 3r, and — >■?. We have added the weakening rules, since they are convenient 
for proof translations and allow shorter proofs (even in the propositional case) 
if dag proofs are considered [8]. 

The main difference between the intuitionistic sequent calculi Cffmc and 
TJ is given by the fact that C.J sequents are restricted to at most one succedent 
formula whereas CJmc sequents are not. For CJmc, we replace the rules — >-r, Vr, 




564 U. Egly and S. Schmitt 



CK. 

r,A\- A,A ‘ 
r h /I 

wr 

r\- A, A 
r\- A, B, A 



Vr 



rh AV B,A 

r\-A,A r\-B,A 
r h aab,a 

r,AhB,A 



Ar 



Vrti 



rh A^ B,A 

r h A[x\a],A 
r h \/x.A,A 

r h A[x\t],^x.A,A 

r h ^x.A,A 



3rt 



r,±\- A 

r h A 
r,A\- A 







J-OX. 


r, A \- B 
r\- B,A 


wl 


r 1- A[ar\o] 
r h yx.A,A 


r,B\- A 


r,A^ B\- A, A . 



r, A \J Bh A 
r,A,B\- A 



Al 



r,AAB\- A 

r\- A, A r,B\- A 
r,A^ B\- A 

r,Va:.^, A[a:\f] h A 
r,\fx.A\- A 

r, yl[a;\a] I- A 



yit 



r,3x.A\- A 



31 a* 



r, A ->■ Bh A 



AJ . 

r\- A 



yr l 



r'r B 



rh Ay B rh Ay B 

r h >l[a;\t] 
r\-3x.A 

r,A^ B\- A r,B\- A 



Vr2 



r,A- 



B\- A 



* a must not occur free in the conclusion of Vr, 3/ (eigenvariable condition). 



Fig. 1. The sequent calculi CK., CJmc, and CJ . 



and — >■? with the rules shown on top of the right hand side in Fig. 1. For CJ , we 
replace the rules Vr, 3r, and — >■? with the rules shown at the bottom of the right 
hand side in Fig. 1. Furthermore, |Z\| = 0 must hold for the ax. rule as well as 
for the remaining right (r) rules. For the rule _Laa;. and all left (1) rules, it must 
hold |Zi| < 1. 

We will also consider CK, CJmc, and CJ extended by the following cut rule 

r\- A, A' r,A\- A 
TVa 

where A = A' for £/Cand CJmc- For CJ, we have |Z\| < 1 and Z\' = 0. The 
resulting calculi are denoted by CK+cut, CJmc+cut, and CJ+cut, respectively. 
We use £T and CX+cut if we speak about properties that hold for all three 
sequent calculi above. 



Sequent proofs. We distinguish between sequent proofs with trees as the un- 
derlying structure and proofs with rooted directed acyclic graphs ( dags ). The for- 
mer ones are more common whereas the latter ones allow more compact proofs. 

A tree proof of a sequent E in each calculus CX is a finite (rooted and 
directed) tree with its nodes labeled with sequents such that the leaf nodes 
are labeled with axioms and the label of each non-leaf node is obtained from the 
label(s) of its successor node(s) by an application of a rule from CX. The sequent 
E labeled at the root is called the end sequent. We say that an end sequent E 
is proven from a set A of sequents if A does not contain any axioms and there 
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is a tree proof of E such that the sequents associated with the leaves of the tree 
are either axioms or elements from A. 

Dag proofs are considered as a special form of tree proofs. 

Definition 1. Let a he a tree proof of an end sequent E from A = {^i, . . . , A{\ 
in a calculus LX , such that each Ai occurs at least twice in a. Then a is a dag 
proof of E, if the following conditions are satisfied: 

1. There is a permutation tt of \,...,l and a sequence of tree proofs 

. . . , of end sequents such that 

O') is a tree proof o/ A^(i) from 0; 

h) a^(k) {I <k<l) is a tree proof of from . . . , ^^(fe_i)}; 

2. each tree proof aTr{i) (i G {1 , . . . ,1}) occurs exactly once in a. 

Let A{a) denote the set A associated with a. We call a a proper dag proof if 

A{a) ^ 0. 

It is immediately apparent that the graph induced by the construction in Defini- 
tion 1 is acyclic. For each Ai (with possibly more than two occurrence in the dag 
proof), exactly one proof is present. The single occurrence of Ai with a proof is 
called a reference sequent; its proof is called reference proof. All occurrences of 
Ai without a proof are called references (to the reference sequent). 

Example 1. Consider the formula E 2 = A 2 A Oq A Oi A O 2 — f C, where 

„ _ f Aq — >■ (7 if f = 0, 

\ Ai — >■ (Ai_i V C) V Ai_i if t = 1, 2, 

and Ao, Ai, A 2 , C are atoms. A dag proof a'^ for F 2 in the calculus LJ is depicted 
in Fig. 2. The set A consists of the sequents S'! = Oq,Oi,Ai h C and S 2 = 
Oqj Aq F C. The proofs for the two reference sequents S'! and S '2 are abbreviated 
by Psi and The two framed sequent occurrences in a'^ refer to Si and S '2 
(gray boxes), respectively, and thus, avoid the duplication of the proofs Ps^ and 
Ps2 in Q;'^. 

Finally, we define two complexity measures for tree and dag proofs. Let a 
be a sequent proof in LX. The length of a is defined by the number of sequent 
occurrences in a, written as seq(a). The size of a is defined by the number of 
character occurrences in a, written as size (a). 

The following definition of a polynomial simulation is adapted from [7]: A 
calculus Pi can polynomially simulate a calculus P 2 if there is a polynomial p 
such that the following holds: For every proof of a formula (or sequent) F in 
P 2 of size n, there is a proof of (the translation of) F in Pi, whose size is not 
greater than p(n). 
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Oo,Oi,Ai h C Oo,Oi,C I- C 
Oo,0-[ i A-\ V C (“ O 



V/ 



Oo,Oi,/4i I- C 



-42»Oo,Oi,02 I" ^2 



Oo,Oi,(/4i vC) V/^1 h C 
A 2 i Oq, Oi , (v4 1 V C) V A\ \- C 



u7 



♦ r, A/, A/, A/ 



-4 2 j Oo , Oi ,/42— ►(/4iVC)v4i H C 
K A 2 A Oo A Oi A O 2 “> O 

(repeating its end sequent Oo,Oi,j4i \- C from above): 



' ^2 



Oo,«4o ^ Oo j C* C" 
Oo. ^0 VC h C 



V/ 



Oo, /4o 1“ C 



Oo}Oi,>li f- ^1 



Oo , (/^o V C) V /4 q ^ O 
Oo j (-^0 V C) V j4q 1 C 



V/ 



h/ 



Ooj (^0 V C) V 4o. O 



Oi 



I I*S 2 I (repeating its end sequent Oo,-4ot"C from above): 



Ooj-^o Aq 



C,Aq \- C 



VI 



Fig. 2. Dag proof a for F 2 from Example 1. 



3 Simulating Dag Proofs by Tree Proofs and Cut 

We present a general method for translating first-order dag proofs in a calculus 
CX into tree proofs with cuts. Furthermore, we show that the size of the resulting 
tree proof is polynomial with respect to the size of the given dag proof. 

The naive way for such a transformation would be to copy subproofs within 
a dag proof until all references have been completely eliminated, yielding the 
desired proof in tree format. Obviously, this approach would not be practicable 
since it could cause an exponential increase of proof size if the reference structure 
in the dag proof occurs in a nested manner. In order to avoid the exponential 
size increase in the worst case, we use applications of the cut rule, which take the 
formula images l{D) of the reference sequents D as cut formulae. The following 
lemma follows immediately: 

Lemma 1. Let S = F \- A where F = Gi,...,G„, A = Di,...,Dm, and 
FV{S) is {x\, . . . ,Xk}. Then: 

(i) The sequent h l{S) is provable from S in LX with n + m + k — 1 sequents. 

(ii) F, i{S) \- A is provable in LX with 2 ■ {m + n) + k — 1 sequents. 
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In the next lemma, we construct a tree proof with cuts from a given dag 
proof. Since we provide a translation for arbitrary first-order proofs, the number 
of sequents in the resulting proof does not only depend on the number of sequents 
in the source proof, but also on the term structure and especially on the number 
of free variables, which occur in formulae of the source proof. Therefore, it is not 
surprising that the number of free variables influences the number of sequents 
occurring in the tree proof with cut. 

Let a‘^ be a proper dag proof in £X, i.e. A{a'^) yf 0, and let 



mv{a‘^) = max{|.7^V(D)| | D G A{a'^) is a reference sequent in a‘^}. 

Let fo be the number of formulae in the sequent D and let 

= max{/£) | D is a sequent in a‘^}. 

For a tree proof a (with A{a) = 0), we set m^(a) = mf{a) = 0. Finally, let 
irip{a‘^) denote the total number of occurrences of sequents from A{a‘^) in a'^ . 
Observe that mp{a) = 0 for a tree proof a, and mp{a'^) > 2 for a proper dag 
proof a'^. 

We prove that dag proofs with cuts can be translated to tree proofs with cuts 
with a polynomial increase of proof size. The use of source proofs with cuts is 
more general than it is required for our application. 

Lemma 2. Let he a dag proof in CX+cut. Then there exists a tree proof a 
of the same end sequent in CX+cut with size (a) < ^size (a‘^) . 

Proof. Let be a dag proof of the end sequent S' = T h Z\ in CX+cut with an 
associated set A{a‘^). We proceed in two steps. 

I. We show by induction on the number k = \A{a^) \ that 



seq (a) < seq (a‘^) + 2 ■ k ■ mp(a^) ■ 

II. We estimate size (a) from seq (a) and the size of 
I. The induction proof. 

Base: k = 0. Then a‘^ is already in tree form, a = a‘^, A{a‘^) = 0, and the 
relation on the number of sequents holds trivially. 

Induction hypothesis. Assume k > 0 and, for each dag proof a‘^ in CX+cut with 
|A(a‘^)| = k — 1, there is a tree proof a in CX+cut such that 



seq (a) < seq (a‘^) + 2 ■ {k — 1) ■ mp{a‘^) ■ 

Step. Consider a dag proof af. with k = |A(af)|. Select arbitrarily from A{af) a 
sequent Di = Pi\- Ai {I < i < k), whose proof is (3f, such that this proof is not 
contained in the proof of any other reference sequent (i.e. select a lower-most 
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sequent). Let VDi denote \TV{Di) \ and let poi > 2 be the number of occurrences 
of Di in af. Observe that there are no references to Di within (if since a dag 
proof does not contain cycles. Cut out the proof (if of Di from af such that 
only Di remains. Let 7^ denote this intermediate incomplete proof. Extend the 
antecedent of each sequent in 7^ by the formula image of Di. Since i{Di) 
is a formula without free variables, no eigenvariable conditions are violated. 

Let (ii be the tree proof of Di = Di,t{Di) h Ai according to Lemma 1 (ii). 
Add a copy of Pi over each occurrence of Di in jf. For the length of Pi, we 
obtain 

seq(Pi) = 2 ■ foi + VDi - I < 2 ■ 

Next, apply Lemma 1 (i) in order to extend the proof Pf of Di to a proof 
of h pDi). Then we introduce all elements of D,A' from the end sequent S by 
applications of weakening. Observe that A' is empty in case of CJ+cut, and 
A' = A for LfC+cut and CJmc+cut. This results in the sequent D h pDpjA'. 
For the additional number of sequents a{Pf), the following relation holds: 

a{Pf) < /n, + UDi - 1 + /s < 2 • + my{af) - 1. 

Finally, one additional sequent is introduced by the cut rule. Combining the 
modified subproofs yields the following proof which we call 

w ^ 

Fi, pDi) h Ai Fi, pDi) h Ai 

F'ri{Di),A' FiPOP'rA 

cut 

rh A 

Observe that no new references have been introduced, but all references for Di 
have been eliminated and therefore, |A(a^_]^)| = k — 1. Consequently, poi < 
mp{af). Moreover, mp{af_i) < mp{af) and = mv{af) since no new 

free variables have been introduced by our manipulations. For mf{af_P), we 
obtain mf{af_P) < mf{af) + 1 because pDp has been added to sequents in 

Let us estimate the increase of length by the above manipulations resulting 
in af_^ from af. From seq (af_^) = seq (ctf)+pDi ■ seq (Pi) + a{Pf) + 1 and some 
calculations, we obtain the estimation 

seq (af_J < seq (af) + 2 • . • {mf{af:) + m„(a^)). 

Since \A{af_P) \ = k — 1, the induction hypothesis applies. We obtain: 






b pOi) 



( 1 ) 
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seq (a) < seq (af_J + 2 • (fc - 1) • mp{at_^) ■ {mf{ai_^) + + ^). 

Using (1), pd, < mp{al), TOp(a^_J < mp{af), < mf(af) + 1, and 

= mv{af), the induction proof concludes as follows: 



seq (a) < seq (af) + 2 • mp(a'l) ■ + m„(a^)) + 

2 • (fc - 1) • nipiai) ■ {mf{ai) + 1 + 

= seq (af) + 2 • fc • mp(af) • {mf{ai) + m„(afc) + ^). 

II. The estimation of size (a). 

Let us simplify some parameters from the above inequality. We can replace 2 • k 
by seq(af) because, for each reference sequent, at least one additional reference 
must occur in af. With mp{af) < seq(af), we calculate 



seq (a) < seq (aff ■ {mf{af) + mv{af) + ^). 

The size of any sequent in a is bounded by (A: + 1) • size (af.) because k 
additional formula images occur in any sequent, each of these images has size 
< size (af) and the original part of the sequent has size < size (af.). Therefore, 
size (a) < (fc + 1) • size (a'j:) ■ seq (a). With seq (af.), {k + l),mf{af),my{af) < 
size (cxf), we obtain size (a) < | • size (af) . □ 

From the constructive proof of Lemma 2, we can derive a recursive 
polynomial-time procedure which translates a given dag proof in CX into a tree 
proof in CX+cut. In particular, no additional search is involved since the refer- 
ence sequents in the dag proof completely guide the process of cut introduction. 
The estimation of the increase of proof size is rather generous; in most prac- 
tical cases, the increase is much less than the worst-case bound. The resulting 
tree proof can then be interpreted directly by an interactive proof assistant. We 
conclude with 

Theorem 1. Tree CX+cut polynomially simulates dag CX+cut. 



Refinements of the translation. So far, we have presented a general method 
to translate arbitrary first-order dag proofs in CX into tree proofs with cuts. It is 
also possible to introduce the cuts locally and not necessarily immediately above 
the end sequent of the proof. The cuts are introduced where they are needed, 
avoiding the extension of each sequent in the whole proof with all formula images. 
This results in a tree proof with an improved structure. 

The local introduction of cuts works as follows: From a given dag proof of 
F in CX, we first select an arbitrary reference sequent Di = Ti \- Ai. 
Then, we search the top-most sequent T = T \- A such that all occurrences of 
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Di are within the proof of T. The cut rule is now introduced directly above T 
such that T becomes the conclusion of the cut. The sequents F h L{Di), A' and 
Fi,L{Di) h Ai are proven in the same way as in the proof of Lemma 2. Then, 
the following proof corresponds to cxf_^ in the proof of Lemma 2, where A' = % 
in LJ+cut and Z\' = Z\ in LJmc+cut and CIC+cut: 



Fi, L{Di) h Ai Fi, i.{Di) h Ai 

F,i{D,)\- A 
cut 

F\-A 

The following example illustrates this local cut introduction in the calculus 
CJ^+cut. 

Example 2. We apply the translation to the CJ dag proof a‘^ (Fig. 2) for the 
formula F 2 from Example 1. For the two reference sequents 81 , 82 , we obtain 
as cut formulae the formula images t{ 8 i) = Oq A Oi A Ai ^ C and l{ 82 ) = 
Oq A Aq ^ C. The resulting replacements leading to a tree proof a in CJ+cut 
are shown in Fig. 3. The subproof starts with the sequent Ti = F^^ F C below 
the first cut introduction, where Ft^ = 0q,0\, {Ai V C) V Fli. The second cut 
introduction takes place above the sequent T 2 = Ft^ h C in cut proof 1 , where 
Ft^ = Oo, (AoVC)VFlo (gray boxes in Fig. 3). The cut proof a can be completed 
by copying the first 7 inferences (starting from the end sequent) from the dag 
proof in Fig. 2. 



4 Cuts and the Structure of Programs 

A central application for intuitionistic logic is the extraction of programs from 
proofs. Constructive program synthesis relies on the parallel process of program 
construction and program verification. Formalizing a logical specification within 
a constructive logic, e.g., Intuitionistic Type Theory (ITT) [16], this specifica- 
tion “formula” will first be proven valid using a sequent calculus for ITT. More 
precisely, one finds a constructive proof for the existence of a function / which 
maps input elements to output elements of the specified program. Then, in a 
second step, / will be extracted from the computational content of the proof 
according to the “proofs-as-programs” paradigm [2]. Hence, / forms a correctly 
verified program term with respect to the given specification. The whole process 
is performed interactively within an constructive program development systems, 
for example, the NuPRL system [3,1]. 





F, h A, 
F i(A) 
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1 ^ 



Oq , Oi , ( V O’) V j4i , i(^Si ) I" 



Oo=Oi.(/1i VC) V>li h C 



V/ 
cut t^S ^ ) 



'D 



ful proof f (repeating its end sequent from above): 



Oc ,0^,A^ h vlt 




Oq,4i ^ (><o VC)V/<o,4i h C 



-^1 



H Oo AOi A At_ ^ C 



, A!, Al 



Pt-i 1“ Co A Oi A 4i 

V 



3 X K'( 



dSi) 



cutprctofl (repealing its end sequent P'l^ h from above): 



Co , j4q h Aq C, Ac h C 



vlo ^ C, Ao I- C 
H Co A 4o "f C 



-^1 



r-iy 



Co A >lo 



->r, Al 
‘i X ivf 



cut appi / 



ASz) 

(two applications of vrd have been introduced due to space restrictions): 

ax. 



Ci,Ai I- Cl Ci,Ai h Al 



Co , Cl , Al h Co 



Cl , Al h Cl A Al 
Co, Cl, Al h Cl A Al 



Ar 



Co , Cl , Al h Co A Cl A Al 
Ooi Cl, 4i, t(,5i) H Co A Cl A 4i 



m7 

Ar 



vr/ 



Co , Cl , A 1 , C 1“ C 



Co , Cl , Al , Co A Cl A Al — h CPC 

^ V ^ 

‘(i’l ) 






cut appi. 2 



Co,Ao,f(S2) P Co Co,Ao,f(52) P Aq 
CojAo,t(52) P Co A Aq 



Ar 



Oo,Ao,C P C 



Co , Ao ) Cq a Ao — f CPC 

AS2) 



Fig. 3. The resulting tree proof a with cuts from Example 2. 
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CJ ■ 



r,x : A \- A L©xt Xj 
r \- A aj r \- B ^ext 6j 
r \- AA B L©xt (a, 6>J 

r \- A L©xt aj 

Vr 7 

r Ay B L©xt ini (a)j 



r \- B lext 6| 






r \- Ay B L©xt inr (6)j 

r, X : A \- B L^xt 6j 
r A B lext Aa?.fe| 



Vr2 






r,x' : T \- A[a:\a?'] [®xt aj 
r h Va; : T. A L^Xt Aa;' . aj 

r h A[ic\t] L®xt aj 
r \- Bx : T. A L©xt (i, a)j 



Vra?' ‘ 



3rt* 



r,x : ±h A L^xt any (jc)j 



J-OX. 



r \- c L®xt tj 



wl 



Al 



r,x -.T\- C L®xt tj 

r, a : A, b : B \- C L®xt uj 
r, z : A A B \- C L®xt let <o, b}=z In u\ 

r,a \ A\- C L®xt Uj r,b B \- C [®xt Vj 
r, z : A y B C L®xt case z of ini (a) w I inr (6) i-»- j 

r,pf : A B \- A L®xt aj F,y : B \- C [®xt fej 
r,/7/: A^BhC L®xt b[y\pfa]j 

Pypf -.yx : T. A^y : A[a;\7] h C L®xt Uj 
r,pf: yx : T. A\- C L®Xt u[y\pft]j 

r,x' : T, a : A[a;\a:'] h C [®xt yj 
r, z : Bx : T. A \- C L®xt let {x' , a) = 2: in u\ 



yi 



~^l 



yif^ 



Blx' * 



CS-\-cut: 

r h T |_ext sj r, X : T \- C L®xt fj 

— — — cut X T 

r \- C L®Xt iXx .t) 8 j 

* a must not occur free in the conclusion of Vr, 3/ (eigenvariable condition). 

** t must satisfy the declaration condition for V/ and Br: All free variables in t must be declared in F . 



Fig. 4. The calculus CJ^ with extract term annotations. 

Logic and type theory. Whereas ITT is designed as a higher-order logic in 
order to express all kinds of reasoning about programs and mathematics, its first- 
order fragment corresponds to first-order intuitionistic logic. According to the 
“formulas-as-types paradigm” [13], every (intuitionistic) formula corresponds to 
a type. Elements of these types can be constructed by proving the corresponding 
formulae in using the “Curry-Howard-isomorphism” [5,20]. 

In order to model this process within the calculus we need to provide 
two kinds of modifications. First, every calculus rule will be annotated with some 
term pattern for program (term) extraction. For instance, the Ar rule will look 
like 

r \- A |6xt 0| r \- B |ext 6j 
r \- Aa B [ext 

and is interpreted as follows: If term a can be extracted from the proof for F \~ A, 
that is, a has been constructed as an element of data type A, and term b can be 
extracted from the proof for F \- B, then the pair {a,b) will be extracted from 
Th AAB. 

For the second modification to integrate program extraction into CJ^, we have 
to represent type elements as part of sequents. Every free (first-order) variable 
in a sequent S = F \- C has to be declared in the hypotheses F. That means. 
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the variable denotes an (object) parameter which has already been constructed 
during the proof. For simplicity, we assume a general object type T for all first- 
order variables and parameters, i.e. we do not consider multi-typed first-order 
logic. Then, a parameter declaration is an expression of the form c : T, where 
c is a first-order parameter and T is our general object type. Moreover, every 
formula F in F interpreted as a type is part of a declaration and thus, declares 
an (object) variable z of type F. Hence, a formula declaration is an expression 
of the form z : F^ where z is a variable and F’ is a (first-order) formula. 

We redefine a sequent as an ordered pair F \- C, where C is a formula 
and T is a set of parameter and formula declarations, where all objects are 
pairwisely distinct. Let S = F \- C he a, sequent with the free variables .F'V(S') = 
{ci , . . . , Cm}, and let xi : Si,X2 ■ S2, . ■ . , Xn ■ Sn he the formula declarations in 
F. Then, the closed sequent Sc of S is defined as 

Sc = Cl . T, . . . , Cm • F, X\ . *5*1 , X2 . S2 , • • • , Xn . Sn b C. 

The set {ci, . . . ,Cm} is also called the set of declared parameters in Sc- In the 
following, we assume exclusively closed sequents in sequent proofs. For this, we 
will use “sequent” instead of “closed sequent” when it is clear from the context. 

The modifications in Cff take place at the quantifier rules. First-order for- 
mulae of the form Qx. F will be extended by the object type to Qx : T. F, for 
Q G {V, 3}. The rules 31 x' and Vr x' add a new declaration x' : T to F for the 
introduced eigenvariable x'. The rules 3r t and VZ t have to respect the decla- 
ration condition: All free variables in t must already have been declared in F. 
That is, all (first-order) objects must have been constructed in the proof before 
using them for quantifier instantiations. 

The complete sequent calculi LJ and FjFcut with declarations and extract 
term annotations are shown in Fig. 4. We use the A-calculus as “programming” 
language for the extract terms. The syntax used in Fig. 4 is for improving read- 
ability only. It is derived from the programming language M L and corresponds to 
the standard display form in the NuPRL system [1]. An actual program construc- 
tion from a closed sequent F \- C works as follows: First, we prove the validity 
of F \- C. Then, we extract the program term by stepwise instantiation of the 
annotated term patterns at each applied inference rule in the proof (starting 
from the axiom rules). 

Extracting procedural programs from CfT+cut proofs. The cut-based 
transformation from Section 3 converts a dag proof a‘^ in Cff into a tree proof a 
in Cff cut. The proof a can be transformed easily into a proof for the extended 
calculus Cff with closed sequents, containing formula and parameter declara- 
tions. Then, program extraction can be performed using the appropriate term 
annotations at the inference rules. ^ In particular, a modular structured program 

^ As pointed out by a referee, formulations of sequent calculi are computationally sen- 
sitive. The use of multisets and implicit contraction in LJ from Section 2 suppresses 
computational anomalies when switching to the annotated calculus in Fig. 4 (see 
[21,23] for details). 
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Table 1. Formula declarations for subformulae and cut formulae of Example 3. 
Subformulac of F 2 = A 2 A Oo A Oi A O 2 — t C: 

c: C xi ■. A2 /\Oo 

for i = 0, 1, 2 ai ■. Ai and pf^ : Oi X 2 ■ A 2 A Oo A Oi 

for 1 = 1,2 zi : Ai-i V C and yi : (Aj_i V C) V Ai-i xa ■ A 2 A Oo A Oi A O 2 

Cut formula t(S'i) = Oo A Oi A Ai ^ C Cut formula o{S2) = Oo A Ao ^ C 

S'l : Oi A Ai S2 : Oo A Ao 

Si : Oo A Oi A Ai Pfs2 ■ ^0 A Ao — > O 

ptsi ■ ^0 ^ ^ ^ 



can be automatically synthesized from the sequent proof by taking advantage of 
the introduced cut rules. 

In our transformation, a cut rule will only be applied in order to factorize 
subproofs of identical sequents. That is, a cut introduction provides a proof for 
(the formula image of) a certain sequent S, whereas the cut applications “apply” 
this proof whenever an identical sequent S occurs in the remaining proof. As this 
process is known as lemma application in proof theory, it can be transferred to 
procedural programming concepts on the side of program extraction. Consider 
the cut rule with its extract terms: 

r \- T |6xt sj r, X : T \- C |ext fj 

r\-C i^ext (Aa;.t)sj ® ^ 

In the resulting program (Xx.t)s, the term s is extracted from the proof of 
the cut formula (lemma) T and thus, s represents a procedure. Instead of re- 
peating the code for s within the (main) program t at every occurrence of x, 
the abstraction Xx .t avoids duplication. Program evaluation, for instance in 
the NuPRL system, is based on lazy evaluation. In our case, /3-reduction is first 
applied to {Xx.t)s. Then, lazy evaluation proceeds on the result t[x\s]. Thus, 
every occurrence of a; in f simulates a call-by-name of the procedure s (see [3] 
for details). From this viewpoint, our cut-based proof transformation allows the 
automatic generation of procedural and hence, short programs with the effect 
that the procedures will not be unfolded before runtime. 

Example 3. Reconsider the cut proof for F 2 from Example 2, as shown in Fig. 3 
together with the first 7 inferences (starting from the end sequent) of Fig. 2. 
We now develop the extract term for this proof by using the annotated sequent 
calculus from Fig. 4. The formula declarations for the subformulae of F 2 as well 
as for the two cut formulae (-(S'l) and i(S' 2 ) are shown in Table 1. The resulting 
program terms extracted from the proof fragments of Fig. 3 and Fig. 2 are 
depicted in Table 2. The procedures in the final program term are the two cut 
proofs cpi and cp 2 , where cp 2 occurs within cpi due to the local introduction 
of cuts in our refined translation. The two cut introductions are represented 
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Table 2. Procedural program terms extracted from the cut proof of Example 3. 



cut application 2 


[ L®** P/sa <P/o<“o>j 1 


cut application / 






' V ' 

C02 




cut proof 2 




^ext ASa . let <p/q , oo> = Sa in p/q aoj 








<^P2 





cut proof I 



L®xt pfs^ 

' V ' 

ra. 



(Ap/s^-case p/i ai of ini <zi ) (case z\ of inl(ao) *->■ ca^ \ inr(c)»-yc ) I inr(oo)*->- ca^ ) cp^ 

' V " 

cuti2 (with the substitution [yi\p/i ai]) 

^ext ASi.let {pfQ,S\)-S\ in (let {pf■^,a\) = S\ in cui2)\ 

' ^ V ^ 

<^P\ 



main proof 



(Ap/ 5 j .case p /2 02 of inl( 22 ) (case of inl(ai)i-^ ca\ \ inr(c)»-4c ) I inr(ai)»-4 ca\ ) rp. 



cutx (with the substitution (v2\p/2 ^2]) 




final program term 



by the subterms cut\ and cut2, respectively. The A-abstraction in cut\ avoids 
duplication of the code for cpi (two gray boxes), i.e. replacing the function 
variable pfg_^ in the subterm c«i with cpi (see gray box in cut application 1 ) . 
Similarly, the A-abstraction in ctt<2 avoids duplication of the code for cp2- Since 
cut2 occurs within the procedure cpi , the two cut introductions prevent the code 
for cp2 to be copied four times, two times in each copy of cpi, and thus, yield a 
procedural and short program. 



5 Conclusion, Related and Future Work 

We have established a polynomial simulation of dag proofs by tree proofs with 
cuts for classical and intuitionistic sequent calculi. This simulation leads to a 
polyomial-time computable translation procedure which is used to integrate 
forward-directed automated theorem provers into interactive proof development 
systems based on the usual classical and intuitionistic sequent calculi (in tree 
form). The outputs of these theorem provers correspond directly to sequent 
proofs in dag form and thus, guide the translation without involving any addi- 
tional search. We have presented a central application of our translation, namely 
the generation of modular programs from first-order intuitionistic proofs. Here, 
the introduction of cuts is interpreted as a procedural programming concept with 
a call-by-name of the procedures when evaluating the resulting programs. 
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A completely difTerent approach for translating dag proofs into tree proofs 
with cuts in classical and intuitionistic propositional sequent systems can be 
found in [14]. Letz’ transformation is based on the idea to simulate the applica- 
tions of inference rules by valid rule sequents and cuts. Take VZ as an example. 
The premises of this rule are Si and S 2 , respectively, and the conclusion is S. 
Then the rule sequent for VZhas the form R = l{Si), (-{ 82 ) b t-{S). The simulation 
starts with the last inference in the dag proof and proceeds in a backward man- 
ner. Applications of rules are simulated by cuts and multiple occurrences of the 
same subproof can be eliminated by contraction. The sketched approach fails for 
full first-order logic because the rule sequents for quantifiers with eigenvariable 
conditions are not valid. 

In the future, we plan to implement our translation procedure for the in- 
tegration of forward-directed proof procedures into interactive proof assistants. 
Furthermore, we will investigate the question whether this translation schema 
can be generalized and applied to other non-classical logics and proof calculi as 
well. For many modal sequent calculi, cut formulae in the antecedent must be 
“protected” from destructive effects of inferences rules. Consider for example the 
□ r rule in a sequent calculus for the modal logic 54, shown on the left hand side 
below, where F* = {OB\OB e F} and A* = {<>B\<>B e A}. In order to avoid 
the deletion of the cut formula by an application of Dr, one has to protect the 
cut formula by □. 

r* h F, A* ^ F, t(5),Vfc(r„^ 

F^aF,A r,i,{S) h A ° 

The formula image t(5) for a sequent S = F \- A is then of the form 0 \/f^(F^ — >■ 
A^). The proof of the sequent F,i,{S) F A ends with an application of the 
rule (see right hand side above). For sequent calculi for S4, a modified translation 
schema exists. We will investigate the applicability of the translation schema to 
other kinds of non-classical logics with classical as well as intuitionistic logic as 
basis. 
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Abstract. We propose a new method for using recurrent schematiza- 
tions in Theorem Proving. We provide techniques for detecting cycles in 
proofs (via proof generalization), and we show how to take advantage 
of the expressive power of schematizations in order to avoid generating 
snch cycles explicitly. This may shorten proofs and avoid divergence in 
some cases. These techniques are more general than existing ones, and 
unlike them, they can be used with any kind of proof procedure (using 
tableaux-based approaches as well as resolution-based ones). 



1 Motivations 

The concept of term schematization has been originally introduced by Chen, 
Hsiang and Kong (see for example [1,2]). The intended goal was to denote, us- 
ing recurrent expressions, infinite sets of structurally similar terms, obtained 
by iterating a given context along some particular paths. Schematizations have 
been developed in order to avoid non-termination and divergence, in existing 
symbolic computation procedures, for example in Rewriting (e.g. to handle the 
case in which the Knuth-Bendix procedure diverges), in Logic Programming (to 
express finitely the set of solutions of a non-terminating query) or in Automated 
Deduction (for giving a finite description of an infinite derivation). Since then, 
improvements have been proposed to the initial language, and several different 
formalisms are now available, with various complexities and expressive powers 
[1,3,11,10,5]. However, only few papers have been dealing with the actual use 
of these formalisms in Automated Deduction. Since the unification problem is 
decidable for most of them, it is of course possible to integrate them into ex- 
isting proof procedures. This enables the user to specify his/her problem into a 
more expressive language. However, in order to take full advantage of the expres- 
sive power of schematizations, it is necessary to go further, and to define rules 
allowing to integrate automatically such schematizations into the clause set at 
hand. This should indeed avoid repeated applications of the same sequence of 
rules, thus reducing the search space and avoiding divergence. For example, a 
theorem prover should be able to deduce automatically the formula F' , given 
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the formula F above, instead of blindly enumerating the set of ground facts 
{ewen(O), odc?(succ(0)), even(succ(succ( 0 ))), . . .}. 

For this purpose, some techniques have been proposed, in order to detect and 
avoid loops during the proof search. 

In [11], a rule is defined in order to avoid repeated applications of the resolu- 
tion rule. Its principle is the following. Given a binary self-resolving clause of the 
form PA t— P and a clause Ff t— Bi, . . . ,Bk such that P and FI have a m.g.u. 
fi, it is possible to deduce a clause of the form PX'^fi t— Pi/r, . . . , Bk^i, provided 
that A satisfies some conditions allowing to express the atom PX^jj, finitely, using 
P-terms. For example, given the clause -^P{x) V P{f{x)) and P(a), it is possible 
to compute the clause P{f^{a)) (where n denotes a new integer variable). In 
[12], the more expressive formalism of primal grammar is used in order to con- 
trol self-application of binary clauses. The proposed method deals with clauses of 
the form AXi — >• AX2 (where Ai, A2 satisfy some additional conditions) and com- 
putes the set of resolvents AA” — >■ AA2 that can be deduced from such clauses. 
In [8], a similar approach is introduced for extending existing model building 
procedures. An inductive inference rule is defined in order to simulate repeated 
applications of the resolution rule on self-resolving clauses, and to give a finite 
description of Herbrand models of clause sets for which the original method did 
not terminate. 

Still, all these techniques are very restricted. First, they do not take into 
account the use of the equality predicate and paramodulation rule (as well as 
similar macro-inference rules coming from specific axioms). Consider for exam- 
ple the clause f{x,g{y)) = f{h{x),y). One should be able to derive the clause: 
f{x,g"‘{y)) = f{h^{x),y), which is not possible using the techniques above. Sec- 
ond, it does not allow crossed recursion. Consider for example the following set of 
clauses: S = {^P{x)y^Q{x)\/ P{f{x)),^P{x)y^Q{x)\/Q{f{x)),P{a),Q{a)}. It 
is clear that for all non negative integers n, the clauses P{f^{a)) and Q(/”(a)) 
are logical consequences of S. Therefore, it should be possible to deduce the 
clauses (Vn)P(/”(a)) and (Vn)(5(/"(a)) explicitly. However, the reader can eas- 
ily check that none of the techniques mentioned above can deduce such clauses. 
Intuitively speaking, this is due to the fact that crossed recursion is needed to 
deduce P(/”(a)) and Q(/”(a)) (i.e. P(/”“^(a)) is needed to prove Q(/”(a)) 
and conversely). 

Third, the above techniques may only be used if a self-resolving clause ex- 
plicitly occurs in the clause set. This is a very restrictive condition if the 
method is to be integrated into a calculus that do not share the lemma build- 
ing property of the resolution method, or even if restriction strategies (such as 
semantic resolution) are used to prune the search space (which is very often the 
case in practice, for obvious efficiency reasons). 

Consider for example, the clause set: S' = {ci : P{a),C2 : (Va;)-'P(a;) V 
~^Q{x),C 3 : (Vx)Q{x) V P{f{x))} 

Here, the clause C4 : (Va;)-'P(a;) V P{f{x)) does not occur in S, but can 
be deduced from S by applying the resolution rule between the clauses C2 
and C3. Then, the inductive rule can be applied, yielding the clause C5 : 
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{\/x,n)-<P{x) V P(/"(a;)). By applying the resolution rule between C5 and ci we 
generate the clause cq : P{f^{a)) and then (with C2) the clause C7 : ~'(5(/"(a)), 
which leads to termination (if an appropriate redundancy checking mechanism 
such as subsumption is used). 

However, if the clause C4 is not generated, then the generation of the clauses 
(Va;,n)-'P(a;)VP(/”(x)),P(/”(a)) and ~'(5(/"(a)) would be impossible, which 
leads to non-termination of the resolution process. 

Unfortunately, most of the proof procedures implemented in running sys- 
tems would not generate such clause as C4. Of course, provers based on the 
tableaux or connection method, or on model elimination, will never generate 
explicitly such a formula (instead, they would for example compute the terms 
P(/(a)), P(/(/(a))), P(/(/(/(a)))), etc. from P{a)). However, even if we restrict 
ourselves to resolution-based provers, it is well known that restriction conditions 
are often added to the resolution rule in order to prune the search space and make 
the system more efficient. Such restrictions will frequently avoid the generation of 
such clauses as C4. For example, most (trivial) semantic restrictions (such as posi- 
tive or negative hyper-resolution) would prevent the application of the resolution 
rule between C2 and C3. Again, only the ground clauses P{f{a)),P{f{f{a))), . . . 
(or -•Q{a),-'Q{f{a )), . . .) will be generated. 

This leads to a very paradoxal and unnatural situation: restricting the ap- 
plication of the inference rules can actually make the whole proof search process 
non-terminating. 

The calculus presented in [6] in the context of semantic tableau cannot deal 
with these examples because it still relies on the explicit generation of self- 
resolvent clauses^ (called cycle unification clauses in [6]). The calculus described 
in [13] in the context of HyperTableaux is closer to our approach. It is applicable 
on tableaux-based procedures, and can handle examples as the ones presented 
above. Moreover it also deals with more powerful schematizations techniques 
where the integer exponents may be non-linear expressions (which entails that 
unification problems may become undecidable) . However it does not use proof 
generalisation techniques. Moreover only some very specific cycles of order 1 (see 
below) are considered. 

Therefore, more general and sophisticated techniques are needed in order to 
overcome these limitations and to control cycles that are much more complex 
than previously investigated ones. This is particularly important if we want 
to integrate the use of term schematizations into tableau-based procedures. The 
present paper presents a solution to this problem. The proposed techniques allow 
to integrate the use of term schematizations into all existing proof procedures and 
in particular into those that does not generate explicitly self-resolving clauses. 
Due to space restrictions, proofs are not included in the paper. Some of the 
proofs and other examples can be found in [9]. 



^ Since this is a tableau calculus, this entails that the rule can be applied only on 
input clauses. 
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2 Basic Notions and Definitions 

2.1 Terms Schematizations 

As mentioned before, there exist several languages for denoting sets of struc- 
turally similar terms. Among all the existing formalisms, we have chosen the 
language of terms with integer exponents (/-terms for short), which have been 
introduced by Comon [3] and extended to terms with several “holes” in [8]. We 
consider this language to be a good compromise between “expressive power” and 
“simplicity” . 

In this section, we recall the definition of /-terms (syntax and semantics). 
Let A be a set of function symbols (including constants) , -d be a set of ordinary 
variables and be a set of integer variables. Let arity be a function mapping 
each symbol / in A to a natural number (the arity of /). 

The set of arithmetic expressions M defined as usual. Ground arithmetic 
expressions of the form s"(0) (resp. s”(x)) will be simply denoted by n (resp. 
X + n). 

The set of ordinary terms the set of I -terms and the set 

of terms with several holes To{E,'&) are the smallest sets satisfying the following 
conditions: 



- d C t/(A, d). 

- d C r(A,d). 

- o e T«(A,d). 

- If / € A, arity{f) = n and (U, . . . ,t„) € ti{E,'&)^, then f{ti, . . . ,tn) € ti(E, d) 
(if n = 0, i.e. if / is a constant symbol, /(ti, . . . , t„) is to be read as the term /). 

- If / € A, arity{f) = n and {ti, . . . ,tn) £ t(A, d)", then /(ti, . . . , t„) £ r(A, d). 

- If / £ A, arity{f) = n, {ti,...,t„) £ (r/(A, d) Ur<>(A, d))" and if there exists 
i £ [l..n] such that U £ t<>(A, d) then /(ti, . . . , t„) £ r«(A, d). 

- If t £ To(A, d), n £ Af, s £ Ti{E,d) then G.s £ 

Let t £ To(A,d) and let s £ r/(A,d). We denote by t{s) the term obtained 
by replacing each occurrence of o in t by s. We denote by TZ the following system 
of rewriting rules: {t°.s — >■ s,G^”^s — >■ t(t”.s)}. 

It is immediate to see that TZ is terminating and confluent. We denote by 
t the normal form of t w.r.t. TZ. 



2.2 Formulae 

Let ds be a set of boolean variables^, and let I? be a set of predicate symbols. 

Definition 1. The set of first-order formulae E^ZT) is the least set such 

that: 

- iTb Q ilJi{TT,E,'d). 

^ The use of boolean variables in the context of the present paper will become clear 
when we will be defining the proof generalization techniques (see Section 4.1). 
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— If p G 12, arity{p) = n, G then p{ti, . . . ,tn) G 

— If F G f)i{f2, S,!}), then -iF is in ipi{f2, S,!}). 

— If Fi, F 2 G tpi{f2, S, H), then Fi A F 2 and Fi V F 2 are in 'ipi{f2, S, -ff). 

— If F G 'ifj{12, E ,•&) and x G {D VJ 12 n), then (yx)F and {3x)F are in 

Formulae are interpreted as usual. Any term of the form t".s is interpreted 
as F.s 

We now introduce the notion of mixed unification problem, that will be used 
later to encode conditions about formulae: A mixed unification problem (m.u.p. 
for short) is either false or a (possibly empty) finite conjunction of equations 
of the form: F = G where F, G are first-order formulae or t = s where t, s are 
terms (note that empty conjunction are interpreted as true). 

M.u.p. are said to be “mixed” because they mix two kinds of equations: equa- 
tions between terms (as in standard unification problems) and equations between 
first-order formulae. Both kinds of equations are interpreted syntactically, i.e. = 
means syntactical identity between expressions (up to a renaming of bounded 
variables) . 

A substitution a is said to be a solution of V = Afci 
i G [l..n], tia is syntactically equivalent to Sia (up to a renaming of quantified 
variables). Since semantic properties of first-order formulae are not taken into 
account in the definition, a m.u.p. can be solved using standard unification rules 
plus specific rules to handle the case of quantified formulae (see [9] for details). 

3 The Calculus 

Before presenting our technique, we need to introduce a calculus, allowing to con- 
struct and denote proofs. The chosen calculus is a very basic version of tableaux 
(without unification) . It should be emphasized that it is not intended to be used 
for actually generating proofs. Instead, the proofs should be obtained using more 
efficient existing calculi (such as an efficient tableaux calculus, resolution, model 
elimination etc.) and then translated into our formalism (this can be done in an 
efficient way). This technique avoids the necessity to define several versions of 
the algorithm, one for each proof procedure. Naturally, from a practical point of 
view, explicit translation of the proof should probably be avoided in an imple- 
mentation. 

The following definition introduces the standard notion of tree labelled by 
formulae. 

A position is a (possibly infinite and possibly empty) sequence of natural 
integers, “p.g” denote the concatenation of the positions p, q and e denotes the 
empty position. ^ denotes the prefix ordering between positions, i.e. p < q iS 
there exists a non empty position p' such that p.p' = q. 

A formula tree is a function mapping each finite position p to a multiset of 
formulae such that, for any pair of finite positions (p, q), if p ^ q and T(p) = 0 
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then T{q) = 0. The notion of branch, closed branch, etc. is defined as usual. 
For any formula tree 'T and for any finite position p in T, we denote by 'T |p the 
formula tree defined as follows: for any position q, we have, by definition: 

T\p{q) =T{p.q) 

If T and T' are two formula trees and p is a finite position, we denote by 
T[p ^ T'] the formula tree defined as follows: 

T[p ^ T']{q) =def'^(p) if 9 = P-P' 

T[p ^ T']{q) otherwise 

Roughly speaking, we call “proof tree” a formula tree in which each node 
is generated from an existing one using the usual inference rules of tableaux 
(y-rule, V-rule, A-rule, etc.) including the “cut” rule. The following is the formal 
definition of this notion. 

Definition 2. A proof tree is a formula tree T such that for any finite position 
p such that T{p) yf 0, one of the following condition holds. 

- Either for all integer i, I(p.i) = 0. In this case p is said to be a leaf node for T- 

- Or T{p) = S' U {Fi V F 2 }, T{p.i) = S U {Fi} if i = 1,2 and Tfp.i) = til else (p is 
said to he a \/-node on F\ V F 2 ). 

- Or T{p) = S U {Fi A F 2 }, T(p.l) = S U {Fi, F 2 } and T[p.i) = % if i^l (A-node 
on Fi A F 2 ). 

- OrT{p) = Su{(Va;)F}, x ^ D, t is a ground I -term, T(p.l) = Su{(Va;)F, F{a; — > 
t}} and T{p.i) — % if i ^ 1 (^-node on (Vx)F and t). 

- Or T{p) = S U {(Va;)F}, x G i?jv, t is a ground arithmetic expression, T(p.l) = 
S U {(Vx)F, F{x — >■ t}} and Tfp.i) = 0 if i A ^ (i-node on (ix)F and t). 

- Or T(p) = S U {(3a:)F}, t is a constant symbol not occurring in T(p), T(p.l) = 
S U {F{x — >■ t}} and T{p.i) = 0 if i ^ 1 (3-node on {3x)F ). 

- OrTfp) = SU{^(Fi AF 2 )}, T(p.l) = SU{(^Fi)v(^F 2 )} andTfp.i) = %ifijbt 
(-i-A-node on ~^{Fi A F 2 ) ). 

- OrTfp) = SU{^(Fi VF 2 )}, T(p.l) = SU{(^Fi)a(iF 2 )} andTfp.i) = $ifiAt 
(A-V-node on ~^{Fi V F 2 ) )■ 

- Or T{p) = SU T(p.l) = SU {F} and Tfp.i) = $ if i A t (^-^-node on 

- OrT(p) = Su{-.(Va:)F}, T(p.l) = Su{(3x)F} andT(p.i) = 0 i/i yf 1 (^-i-node 
on ^{ix)F ). 

- OrTfp) = Su{^(3a;)F}, T(p.l) = Su{(Va:)F} andTfp.i) = 0 i/i yf 1 (~<-3-node 
on -i(3x)F ). 

- Or Tip) = S U {F, ~nF'}, Tip.l) = {T} and T{p.i) = 0 if i A I, if F ln= F' fpi 
(closing node on F and -<F). 

- Or Tip) = S, Tip.l) = SU {F}, Tip.2) = S U {-nF} and Tip.i) = 0 if i ^ 1,2 
and F is any formula (cut node on F). 

A proof tree of a set of first-order formula S is a proof tree T such that 
Tie) = S. 

A proof tree is said to be complete iff it contains no non-closed leaf. 
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A proof tree T is said to be fair iff no rule is infinitely delayed (i.e. if a rule 
is applicable at a given finite position p in T on a formula F, then it must be 
applied on the same formula in all non-closed branches of the form p.q in T). 

It is well-known that this calculus is sound and refutationnally complete. Of 
course, allowing integer variables in the formulae makes the calculus incomplete. 
Indeed, the satisfiability problem is actually undecidable for formulae containing 
integer variables (since they may encode Peano arithmetic). 

It is well-known that the proofs constructed using any existing proof proce- 
dure such as resolution, tableaux (with unification), sequent calculus, the con- 
nection method etc., can be translated into a closed proof tree. Due to the use of 
the cut rule, this can be done in polynomial time, w.r.t. the size of the original 
proof which makes this approach effective. This is also true if the paramodulation 
rule is added to the calculus, since it can be simulated by repeated application 
of the substitution axioms. 

4 The Inductive Rule 

We now have all what we need to define our new inductive rule. Intuitively 
speaking, its principle can be summarized as follows: 

1. Try - during the search for a proof - to detect a potential cycle in the proof 
tree, i.e. a sequence of inference rules that can be applied repeatedly, hence 
that can lead to non termination. Potential cycles are detected using proof 
generalization, by constructing the most general form of the corresponding 
sequence of inference rules. 

2. Construct (using terms with integer exponents) the general form of the for- 
mulae deduced during the repeated applications of the sequence of inference 
rules. 

This technique permits to express finitely infinite branches. Thus, divergence 
is avoided in some case. 

4.1 Proof Generalization 

We first describe our proof generalization algorithm. It takes as input a proof 
tree T and computes a new formula tree T® such that T is an instance of T®, 
and a m.u.p. P® expressing constraints on the variables occurring in T® so that 
T® is a proof tree. 

For any formula F of the form {Qx)G (where Q = 3,V), we denote by 
A(F) the formula obtained from G by replacing all terms (resp. formulae) not 
containing the variable x by new, pairwise distinct variables in d (resp. ^ 3 ) (this 
is a kind of variable abstraction (as originally introduced by Baader and Schulz), 
performed on all terms not containing x). 

Let T be a finite proof tree. Let ^ be a function mapping all formulae F to 
distinct variables in ids- For any finite position p, we inductively define the pair 
(T®(p),Pp) as follows (starting from the leaf to the root). 
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— If T{p) is empty then T^{p) =def^ ^ 

— Assume that for all integers i, {Tp^,Vp.i) is constructed. Then we define 
{T^{p),'Pp) as follows. 

T®(p) I ^ ^ "Tip)}- Moreover: 

— If p is a leaf, then Vp =^^jT . 

— If p is a “'-“'-node on then Fp = dgj'Pp.i A ^{~'~'F) = —i—^F^F). 

— If p is a “'-V-node on ~'{Fi V F 2 ), then Vp =def'^P-^ ^ V T 2 )) = 

—'{<l>{Fi) V ^(^ 2 )) A <l>{—'Fi A “'T 2 ) = A “'^>(^ 2 )). 

— If p is a “'-A-node on “'(T’l A F 2 ), then Vp =def'^P '^ ^ ^{~'{Fi A F 2 )) = 
—'{<l>{Fi) A ^(^ 2 )) A <l>{—'Fi V “'T 2 ) = “'<?(T’i) V —'<1>{F2). 

— If p is a “'-V-node on “'(Vx)F, then Vp =def Fp.i A ^(“'(Va:)F) = 
~^{\Jx)F' AF' = A{F) A <l>((3x)i^) = {3x)F'. 

— If p is a “'-3-node on “'(3cc)F, then Vp =def Fp.i A ^(“'(3a;)F) A 
^{3x)F' AF' = A{F) A <P{{3x)F) = {3x)F'. 

— If p is a V-node on Fi V F 2 , then Vp =def'^P-^ ^ Fp ,2 A ’1>{F) = ^(Fi) V 
<P{F2). 

— If p is a A-node on F\ A F 2 , then Vp =dgfFp.i A <P{F) = <P{Fi) A <^(^ 2 ). 

— If p is a closing node on F, ^F, then Vp =def~'^^^^ ^ ^(“'F). 

— If p is a cut node on F, then Vp =def^^~'^^ ^ -'${F) A Vp.i A Vp. 2 - 

— If p is a V-node on (Vx)F, then Vp =def ^((Va:)F) A (Vx)A(F) A 
<l>{F{x -A t}) A A{F){x -A y} A Vp,i where y is a new variable. 

— If p is a 3-node on (3x)F, then Vp =def ^((3a;)F) A (3x)A(F) A 
<P{F{x -A t}) A A{F){x -A f{xi, . . .,Xn)} A Vp.i{y -A f{xi, . . .,Xn)} 
where / is a new function symbol and xi, . . . , Xn are the free variables 
in T^{p). 

(T®,Fe) is called a generalization of T. 

Lemma 1. Let T he a proof tree. Let S be a set of quantifier-free formulae. Let 

(T®,F) be a generalization ofT. 

1. For any solution a ofV, T^a is a proof tree. 

2. There exists a solution 9 of V such that T^9 = T. 

Example 1. Let us consider the formula {\/x,y)P{g{a),x,y) -A 

P{g{a), f{x),g{y)) A P{g{a),a,b). We build the following (obviously not 

complete) proof tree^. 

p(g(a),a,b) and all (x, all (y, not p(g(a),x,y) or p(g(a) ,f (x) ,g(y) ) ) ) 
p(g(a) ,a,b) 

all (x, all (y, not p(g(a),x,y) or p(g(a) ,f (x) ,g(y) ) ) ) 

® This proof tree (as all other examples in this paper) has been constructed 
with the interactive theorem prover fi-IPSATiNP (a micro Interactive Prover with 
Schematization) described in Section 5. 
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alKy.not p(g(a),a,y) or p(g(a) ,f (a) ,g(y) ) ) 
not p(g(a),a,b) or p(g(a) ,f (a) ,g(b) ) 
not p(g(a) ,a,b) 
false 

p(g(a) ,f (a) ,g(b) ) 

The corresponding generalization of this proof tree is (after solution of the 
constraints and simplification): 

p(_X,_Y,_Z) and all (x, all (y, not p(_X,x,y)or p(_X,f (x) ,g(y) ) ) ) 
p(_X,_Y,_Z) 

all (x, all (y, not p(_X,x,y)or p(_X,f (x) ,g(y) ) ) ) 
alKy.not p(_X,_Y,y) or p(_X,f (_Y) ,g(y) ) ) 
not p(_X,_Y,_Z) or p(_X,f (_Y) ,g(_Z) ) 
not p(_X,_Y,_Z) 
false 

p(_X,f(_Y),g(_Z)) 

Following standard Prolog conventions, expressions starting with an under- 
score denotes variables. 



4.2 Detecting Cycles and Computing Schematizations 

In this section, we identify criteria sufficient to detect potential cycles and to 
ensure that the re-construction of the infinite branch using term schematizations 
is possible. 

A substitution 9 is said to be simply recursive iff for all variables x € T)(9): 

1. X occurs in x9; 

2. X does not occur in a subterm of the form f”.s in x9; 

3. and for any y G T){9) \ {x}, y does not occur in x9. 

In contrast to the similar notions of directly recursive [11] or mono-cyclic 
substitution [12], we do not require the substitution 9 to be variable-preserving. 
However, we will not rename the variables introduced by the substitution 9 at 
each step, which will insure that the obtained set of terms can be denoted finitely 
using /-terms. Similarly, we also assume that all cycles are of length 1 (i.e. we 
do not allow substitutions of the form x — >■ f{y),y — >■ g{x)). This does not entail 
any loss of generality, because, as shown in [12], cycles of length greater than 1 
may be reduced to cycles of length 1 (using unfolding) . 

Lemma 2. If 9 is simply recursive, then for any natural number n and for all 
variables x € 1^(9), we have x6*" = where t = x9{x — >■ o}. 



Two Kinds of Proof Cycles 

In this section, we give the definition of proof cycles and we identify syntactical 
criteria allowing to detect them. Informally speaking, a proof cycle is a sequence 
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of inference rules (or a subtree) that can be repeatedly applied, leading to a proof 
tree of arbitrary size (hence to divergence). Our goal is to “ eliminate” these 
cycles, by expressing these arbitrary proof trees in a symbolic way (using integer 
variables and term schematizations). To this purpose, an important restriction 
is that we explicitly require that these proof trees should contains only a fixed 
number of (distinct) open branches (the conditions in the Definition of proof 
cycles below will insure that this property is always satisfied) . 

Indeed, having to consider proof trees containing N open branches, when 
N is not a fixed integer, but a term, whose value depends on the value of the 
variables in the proof tree would require a specific schematization language for 
denoting those trees and would therefore lead to a much more difficult treatment 
of trees. This additional source of complexity could be possibly superior to the 
advantage of using term schematizations (see also Section 6). 



Proof Cycles of Order 1 Proof cycles of order 1 are the most simple possible 
cycles. They occur when a given branch contains a sequence of applications 
of inference rules that can be generalized in such a way that these rules can be 
applied again on the conclusion of the sequence. This is well illustrated by Figure 
1: since the actual value of the term /(a) is not relevant for the application of 
the inference rules in the considered sequence, we may apply also these rules on 
/(/(a)), which leads to a loop. 



P(a) 

-P(x) I P(f(x» 

-P(a) I P(f(a» 

-PCaf"^"^ P(f(a))- I 

X -P(f(a)) I P(f(f(a») 

-P(f(a» P(f(f(a))) ' 

X I 

T 

P(f(f(...(a))» 

Fig. 1. A simple example of a proof cycle of order 1 



Since we want to deal with proof trees having a fixed number of branches, 
we explicitly require that all branches parallel to the considered one in the gen- 
eralized cycle must not contain any variable. As we shall see, this ensures that 
all these branches can be merged into a single one. 

Definition 3. Let T he a proof tree. A cycle of order 1 in T w.r.t. Sinit,d,r] is 
a pair {p, q) such that p ^ q and satisfying the following conditions: 

1. (T®,P®) is a generalization ofT. 

2. a is the most general solution ofV^. 

3. T^{p)o is of the form S U Si„it. 

4- T^(q)a is of the form S U Si„u U Si„itO U S' . 
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5. 6 is simply recursive. 

6. Sa does not contain any variable in T>{0). 

7. For any position q' such that p < q' and q || q' , T^{q')cr contains no variable in 
V(6). 

8. rj is the substitution such that T^aq — T (q must exist by Lemma 1, point 2). 

9. q is a leaf in T- 



Proof cycles of order 2 Proof cycles of order 2 are slightly more complicated. 
They occur when a sequence of inference rules leads to a conclusion C that 
is “similar” (in a sense to be formally defined: they are two instances of the 
same scheme) to the negation of a formula occurring in an open branch in the 
corresponding subtree. Then, this sequence of inference rules may be repeatedly 
applied from the leaf containing C. This will leads to a finite number of open 
branches, since the branch containing -<C will be closed. 



-P(x,g(y)) I P(f(x),y) 
-P(a,g(g(b))) I P(f(a),g(b)) 



-P(a,g(g(b))) 



-P(a,g(g(g(...(b))))) 



P(f(a),g(b)) 
-P(f(a),g(b))|P(f(f(a)),b) 



-P(f(a),g(b)) 



P(f(f(a)),b) 



P(f(f(...(a))),b) 

Fig. 2. A simple example of a proof cycle of order 2 



Definition 4. Let T he a proof tree. A cycle of order 2 in T w.r.t. Sinit,d,V 
is a tuple {p,q,q') such that p ^ q, p ^ q', q \\ q' and satisfying the following 
conditions: 

1. (T®,P®) is the generalization ofT. 

2. a is the most general solution ofV^. 

3. T^(p)a = S'. 

4. T<^{q')(J = S U S" U I F G Si„«F}. 

5. T^{q)a is of the form S U Si„u U Si„it9 U S'. 

6. 0, o' are simply recursive and of disjoint domains. 

7. Scr and S"o do not contain any variable in T>{6). 

8. For any position q" such that p < q' and q || q" , q" || q' , 'T^{q')a contains no 
variable in FiO). 

9. q is the substitution such that T^aq ~T(q must exist by Lemma 1, point 2). 

10. q and q' are two leaves in T. 

We can now add a Cycle Detection rule into our calculus. This is done by 
the following: 
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Definition 5. A proof tree T is said to be an extended proof tree iff for all 
positions p in T such that T{p) ^ 0, either one of the conditions of Definition 2 
holds or one of the following conditions holds: 

— (q,p) is a cycle of order 1 w.r.t. Si„it,d,ri, and 

- T(p.l) = T(p) U {Si„itO")v, 

- and Tfp.i) = 0 ifiA^i 

— or {q,p,q') is a cycle of order 2 w.r.t. Sinit,d,p and: 

- T(p.l) = T{p) U (Sinite'")g, 

- T{p.2) = T{p) U {Si„ite^)g, 

- T(p.i) =^ifi^l; 

By Lemma 2, SinuO'^ can be denoted by a set of first-order formulae on 
/-terms. Next theorem can be considered as the “main result” in this paper: 

Theorem 1. The Cycle Detection rule is correct, i.e. if a formula F has an 
extended closed proof tree then it must be unsatisfiable. 



Example 2. (continued) Let us apply the cycle detection rule on the proof tree 
obtained in Example 1. We get the following enriched proof tree: 

p(g(a),a,b) and all (x, all (y, not p(g(a),x,y) or p(g(a) ,f (x) ,g(y) ) ) ) 
p(g(a) ,a,b) 

all (x, all (y, not p(g(a) ,x,y)or p(g(a) ,f (x) ,g(y) ) ) ) 
alKy.not p(g(a) ,a,y) or p(g(a) ,f (a) ,g(y) ) ) 
not p(g(a),a,b) or p(g(a) ,f (a) ,g(b) ) 
not p(g(a) ,a,b) 
false 

p(g(a) ,f (a) ,g(b) ) 

p(g(a) ,f (<>)*n.a,g(o)~n.b) 

’/. <> denotes the "hole" 

Keeping the notations of Definition 3, 9 is the substitution (which is clearly 
simply recursive): {_¥—>• f (_Y), _Z — >• g(_Z) } and p is: { _Y — >• a, _Z — >• b} 

5 Implementation and Examples of Application 

The techniques and algorithms described in this paper has been implemented 
into a small theorem prover, called pi-IPS atinf- Through /r-IPS^i./wF does provide 
some features for automatic theorem proving (using unification, iterative depth- 
first search, or breath-first search, with very simple selection strategies allowing 
to restrict or to delay the application of the most costly rules), its intend is 
mainly to allow a user to construct proof trees interactively and to experiment 
with the techniques proposed in the present paper, namely proof generalization, 
cycle detection and computation of term schematizations. The integration of 
these techniques into an efficient, “real-world” prover and the corresponding 
experimentations remains to be done (to the best of our knowledge, no existing 
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running prover integrates similar techniques). /t-IPSat/wf is implemented in B- 
Prolog (see for example http://www.probp.com/). 

We now give a slightly more complicated example, coming from a resolu- 
tion proof, and including crossed recursion and equational reasoning. We show 
how the corresponding proof can be transformed into our calculus and how the 
corresponding cycle can be detected and eliminated. 

1 /(O) = s«cc(0) 3 g{f{x)) g{succ{x)) V -'p(m, r) V p{u, succ{x)) 

2 p{y, 0) 4 “ip(0, succ(x)) V f(succ(x)) = succ(f(x)) 



Assume that the following derivation S has been constructed: 

5 g(succ(0)) / g(succ(0)) V ^p(u,0) V p(u, succ(O)) (paramodulation, 1, 3) 

6 -^p(u, 0) V p(u, succ(O)) (resolution, 5, reflexivity axiom) 

7 p(u, succ(O)) (resolution, 2, 6) 

8 f{succ{0j) = succ{f{0j) (resolution, 4, 7) 

9 f(succ(0)) = succ(succ(0)) (paramodulation, 1, 8) 



First, S is automatically translated into our calculus. Application of the reso- 
lution rule on ground unit clauses can be handled by the standard rules (closing 
and V-rules). Any application of the resolution rule on non ground clauses or non 
unit clause may be simulated by an application of the “cut” rule"^. The applica- 
tion of the paramodulation rule is replaced by several steps of resolution with 
the appropriate substitutivity axioms. Note that this technique can be applied 
on any similar macro-inference rule. 

We obtain (after generalization and simplification), the following proof tree 
(in this particular case, the generalized proof tree is almost identical to the 
original one, excepted that the term 0 is replaced by a variable _X). For the sake 
of clarity and conciseness, we dropped the irrelevant equality axioms and merged 
some of the resolution steps into one single “cut” rules. 

fC_X) - succ(_X) 
all(y,p(y,_X)) 

all(x ,all (u,not (g(f (x) ) = g(succ(x))) or not(p(u,x)) or p(u, succ (x) ) ) ) 
all(x,not(p(_X,succ(x)) or f(succ(x)) = succ(f(x)))) 

not (all (u,p(u, succ (_X) )) ) % an application of the cut rule for cl. 7 
exist s(u, not (p(u, succ (_X) ) ) ) 
not (p (c , succ (_X) ) ) 

all (u,not (g(f (_X) ) = g(succ(_X))) or not(p(u,_X)) or p(u, succ (_X) ) ) 
not(g(f(_X)) = g(succ(_X))) or not(p(c,_X)) or p(c,succ(_X)) 
notCg(f (_X))) = gCsuccC_X)) 

all(x,all(y,g(x) = g(y) or not(x = y))) % an equality axiom 
g(f(_X)) = g(succ(_X)) or not(f(_X) = succ(_X)) 
g(f(_X)) = g(succ(_X)) 
false 

not(f(_X) - succC_X)) 
false 

not(p(c,_X)) 

p(c_X,_X) 

false 

More precisely, a resolution step between P(t) V R and V R' is replaced by 

an application of the cut rule on the clause (Va:)(A V R')0 where 6 is the m.g.u. of t 
and t' and x is the vector of variables in {R V R')9. 
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p(c,succ(_X)) 

false 

all(u,p(u, succ (_X) ) ) % thus clause 7 must be true 

not (p(_X , succ(_X) ) or fCsucc(_X)) = succ(f(_X))) 
not (p(_X , succ(_X) ) ) 
p(_X , succ(_X) ) 
false 

f(succ(_X)) = succCf(_X))) 

f(succ(_X)) = succ(succ (_X) ) % using equality axioms as previously 

Here, we obtain a proof cycle of order 1 between f(succ(_X)) = 
succ(succ(_X) ) and f(_X) = succ(_X) (on one hand) and all(y,p(y,_X)) 
and all (u,p(u, succ (_X))) on the other hand. The reader can check that we 
can apply the cycle detection rule (of order 1), in order to derive the clauses: 
(Vn)/(sMcc(o)”.0) = /(succ(sMcc(o)”.0)) and (Vu, n)p(w, succ(o)".0). 

6 Conclusion 

We have presented some techniques for using term schematizations in Automated 
Deduction. They are particularly useful for integrating term schematizations into 
proof calculi that does not have the lemma building property of the resolution 
method (such as the tableaux or connection calculus) or into calculi that com- 
bines features from both approaches, such as model elimination procedures, or 
model generation theorem provers (see for example [7]). This could help to re- 
duce the search space, and possibly to improve the termination behavior of these 
methods. The corresponding algorithms have been implemented in the context 
of an interactive theorem prover fi-IPSATiNF- We believe that the Cycle Detec- 
tion rule could be added into a theorem prover with a “reasonable” computation 
cost. Indeed, the construction of the generalized proof tree can be done in linear 
time w.r.t. the size of the original proof (it can actually be done dynamically 
during the construction of the proof) and only require a small amount of addi- 
tional memory. Moreover, the Cycle Detection rule is also polynomial w.r.t. the 
size of the considered proof tree. Heuristics could be used to decide whether the 
system should try to apply the Cycle Detection rule (for example if it detects 
a “potential cycle” such as p{a) , p{f (a)) , . . .). The prize to be payed, of course, 
is that the theorem prover should be able to deal with the more expressive for- 
malism. Hence, unification algorithms for term schemes (using for example the 
procedures described in [1,3,12,8,4]) have to be added into the prover. These 
algorithms are in general significantly more costly than the standard unification 
procedure. However, this could be overcome by the additional expressive power 
of term schematizations (otherwise, it would be necessary to further restrict the 
considered class of term schematization in order to ensure that unification prob- 
lems may be solved effectively). We consider that these questions could deserve 
to be investigated more deeply. The possibility of dealing with cycles contain- 
ing an indefinite number of distinct open branches could be investigated. This 
would require to extend the existing term schematization techniques in order 
to denote infinite sets of formulae and infinite sets of tableaux instead of terms. 
Some completeness or incompleteness results could also be investigated for the 
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Cycle Detection rule, i.e. one should try to identify classes of formula F for which 
any logical consequence C oi F (where C possibly contains integer variables) can 
be deduced using the usual inference rules enriched by the Cycle Detection rule. 
Another way of future research is to investigate the effect of the adding of the 
Cycle Detection rule on the termination behavior of the calculus. Does there 
exists classes of formulae for which the calculus enriched by the Cycle Detection 
rule (and by appropriate strategies and redundancy checking mechanisms) can 
be proven to be a decision procedure? 
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Abstract. The dependency pair method of Arts and Giesl is the most 
powerful technique for proving termination of term rewrite systems au- 
tomatically. We show that the method can be improved by using tree 
automata techniques to obtain better approximations of the dependency 
graph. This graph determines the ordering constraints that need to be 
solved in order to conclude termination. We further show that by us- 
ing our approximations the dependency pair method provides a decision 
procedure for termination of right-ground rewrite systems. 



1 Introduction 

In the area of term rewriting termination has been studied for several decades 
and many powerful techniques have been developed. Three general directions 
can be distinguished: 

1. Syntactic methods that compare terms by constructing an explicit well- 
founded order. These methods are fully automatable but have limited power. 
Well-known examples are the recursive path order of Dershowitz [10] and the 
Knuth-Bendix order [18]. 

2. Semantic methods that compare terms by interpreting them in some well- 
founded domain. These methods can be very powerful in theory but their im- 
plementations rely on heuristics that greatly reduce this power. Well-known 
examples are the polynomial interpretations of Lankford [22] and the seman- 
tic path order of Kamin and Levy [17]. 

3. Transformation methods which do not attempt to prove termination directly 
but rather transform the given rewrite system into another rewrite system 
such that termination of the latter system is easier to prove and implies ter- 
mination of the former system. Examples include the transformation order of 
Bellegarde and Lescanne [6], and Zantema’s distribution elimination [27] and 
semantic labelling [28]. Transformations differ in their degree of automation. 

Since termination is an undecidable property of rewrite systems, even for systems 
that consist of a single rewrite rule, no method will work in all cases. Recently a 
new automatable technique emerged: the dependency pair method of Arts and 
Giesl. In this method a rewrite system is transformed into a set of ordering 
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constraints such that termination of the rewrite system is equivalent to the 
solvability of the constraints. The generated constraints are typically solved by 
standard techniques (polynomial interpretations, path orders), even when these 
techniques are not applicable to the original rewrite system. The power of the 
dependency pair method has been amply illustrated in a sequence of papers by 
Arts and Giesl [2,3,4]. 

The ordering constraints in the dependency pair method are generated by 
analyzing the cycles in the dependency graph. This graph summarizes the rela- 
tionships between the dependency pairs of the rewrite system. More precisely, 
there is an arrow from dependency pair s — >■ t to dependency pair u — >■ v in the 
dependency graph if some instance of t rewrites to some instance of u. Since this 
is undecidable in general, the dependency graph has to be estimated by a de- 
cidable approximation. Arts and Giesl proposed the following simple algorithm 
for this purpose: there is an arrow in the so-called estimated dependency graph 
from s — >■ t to u — >■ V if the term obtained from t by replacing all outermost 
defined symbols by variables and a subsequent linearization unifies with u. 

The approximation of Arts and Giesl often results in an unnecessarily large 
graph and hence a large number of constraints. Sometimes, as examples in this 
paper will demonstrate, this causes the failure of the termination proof. The 
aim of this paper is to show that by using tree automata techniques we obtain a 
much better estimation of the dependency graph. Our approach is based on the 
following two ingredients: 

1 . The powerful framework of Durand and Middeldorp for the study of decid- 
able call-by-need computations in orthogonal term rewriting. This framework 
is parameterized by so-called approximation mappings. An approximation 
mapping abstracts from certain parts of the terms in the rewrite rules such 
that the set of terms that rewrite to a term in an arbitrary regular tree 
language is again regular. 

2. The folklore result that it is decidable whether the set of ground instances 
of an arbitrary term intersects with a regular tree language. This result is 
well-known for linear terms but it also holds for non-linear terms. 

We show that by adopting the so-called nv approximation we always obtain 
an estimation of the dependency graph which is at least as good as the one 
of Arts and Giesl but often better. Interestingly, we can automatically prove 
termination of rewrite systems outside the class of so-called DP quasi-simply 
terminating systems. This class, proposed by Giesl and Ohlebusch [14], consists 
of all rewrite systems “where an automated termination proof using dependency 
pairs is potentially feasible” . 

The remainder of the paper is organized as follows. In the next section we 
briefly recall the basics of the dependency pair technique. Section 3 contains 
background material on tree automata. In Section 4 we define new approxima- 
tions of the dependency graph. We compare our approximations with the one of 
Arts in Giesl in Section 5. We also include a comparison with the approxima- 
tion of Kusakari and Toyama [19,21]. In Section 6 we show that by using our 
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approximations the dependency pair method provides a decision procedure for 
termination of right-ground rewrite systems. 

2 Dependency Pairs 

We assume familiarity with the basics of term rewriting ([5]). A term rewrite 
system (TRS for short) consists of rewrite rules I — >■ r that satisfy I ^ V and 
Var(r) C Var(Z). If these conditions are not imposed we find it useful to speak 
of extended TRSs (eTRSs) . Such systems arise naturally when we approximate 
TRSs or orient the rewrite rules from right to left, as explained in Section 4. 
Note that eTRSs which are not TRSs can never be terminating, but in this 
paper we will make clear that such eTRSs are very useful for automatically 
proving termination of TRSs. 

Below we recall the basic notions and results of the dependency pair technique 
of Arts and Giesl. We refer to [2,4,12] for motivations and further refinements. 
We adopt the notation of [13,20]. Let TZhe a, TRS over a signature As usual, 
root symbols of left-hand sides of rewrite rules are called defined. Let IF** denote 
the union of T and {/** | / is a defined symbol of TZ} where /** has the same 
arity as /. Given a term t = /(G, . . . ,tn) G T(lF, V) with / defined, we write 
for the term . . . , t„). If I — >• r G 7^ and t is a subterm of r with defined root 

symbol then the rewrite rule — >■ is called a dependency pair of TZ. The set 
of all dependency pairs of TZ is denoted by DP (7^). In examples we often write 
F for f». 

An argument filtering for a signature .F is a mapping tt that associates with 
every n-ary function symbol an argument position 7 G {1, . . . , n} or a (possibly 
empty) list [ii, . . . ,im] of argument positions with 1 ^ ii < ■ ■ ■ < im ^ n. 
The signature Fr consists of all function symbols / such that 7 t(/) is some list 
[G, . • . , im], where in Ftt the arity of / is m. Every argument filtering tt induces 
a mapping from T(F, V) to T(Fn-, V), also denoted by tt: 

{ t if t is a variable, 

TT{ti) if t = /(7i, . . . ,t„) and 7 t(/) = i, 

/(7r(Gj,...,7r(G^)) if t= /(7i,...,7„) and 7 t(/) = [G,...,im]. 

Thus, an argument filtering is used to replace function symbols by one of their 
arguments or to eliminate certain arguments of function symbols. 

A preorder is a transitive and reflexive relation. A rewrite preorder is a pre- 
order F on terms that is closed under contexts and substitutions. A reduction 
pair consists of a rewrite preorder F and a compatible well-founded order > 
which is closed under substitutions. Here compatibility means that the inclusion 
F . > C > or the inclusion > • F C > holds. The following theorem presents the 
basic dependency pair approach. 

Theorem 1 (Arts and Giesl [4]). A TRS TZ over a signature F is terminating 
if and only if there exists an argument filtering tt for F** and a reduction pair 
(F, >) such that tt{TZ) C F and 7t(DP(7^)) C >. □ 
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Because rewrite rules are just pairs of terms, tt{TV) C ^ is a shorthand for 
-k(1) ^ 7r(r) for every rewrite rule I ^ r £ TZ. From now on we assume that all 
(e)TRSs are finite. 

Rather than considering all dependency pairs at the same time, like in the 
above theorem, it is advantageous to treat groups of dependency pairs separately. 
These groups correspond to cycles in the dependency graph DG(T^) of TZ. The 
nodes of DG(T^) are the dependency pairs of TZ and there is an arrow from s ^ t 
to w — >■ V if and only if there exist substitutions a and t such that ta ~^n UT. (By 
renaming variables in different occurrences of dependency pairs we may assume 
that a = T.) A cycle is a non-empty subset C of dependency pairs of DP (7?.) if 
for every two (not necessarily distinct) pairs s — >■ t and u ^ v in C there exists 
a non-empty path in C from s — >■ t to u — >■ u. 

Theorem 2 (Arts and Giesl [2]). A TRS TZ is terminating if and only if for 
every eycle C in DG{TZ) there exists an argument filtering tt and a reduction pair 
(^, >) such that tt{TZ) C 7t(C) C ^ U >, and tt(C) fl > yf 0. □ 

Note that 7t(C) fl > yf 0 denotes the situation that 7r(s) > 7r(t) for at least 
one dependency pair s ^ t £ C. 

Since it is undecidable whether there exists substitutions cr, r such that 
ta UT, the dependency graph cannot be computed in general. Hence, in order 
to mechanize the termination criterion of Theorem 2 one has to approximate the 
dependency graph. To this end. Arts and Giesl proposed a simple algorithm. 

Definition 3. Let TZ he a TRS. The nodes of the estimated dependency graph 
EDG(T^) are the dependency pairs of TZ and there is an arrow from s ^ t to 
u ^ V if and only if REN(GAP(t)) and u are unifiahle. Here GAP replaces all 
outermost subterms with a defined root symbol by distinct fresh variables and 
REN replaces all occurrences of variables by distinct fresh variables. 



Lemma 4 (Arts and Giesl [4]). Let TZ be a TRS. 

1. EDG(T^) is computable. 

2 . DG(7^) C EDG(7^). □ 

3 Tree Automata 

We briefly recall some basic definitions and results concerning tree automata. 
Much more information can be found in [8]. 

A (finite bottom-up) tree automaton is a quadruple A = (IF, Q,Qf, A) con- 
sisting of a finite signature a finite set Q of states, disjoint from TF, a subset 
Qf Q Q of final states, and a set of transition rules A. Every transition rule 
has the form /(< 7 i, . . . , ?«) — >■ q with f £ !F and qi, . . . ,qn,q £ Q. So a, tree 
automaton A = Q,Qf,A) is simply a finite ground TRS U Q, A) whose 
rewrite rules have a special shape, together with a subset Qf of Q. The induced 
rewrite relation on Q) is denoted by — >-_ 4 . A ground term t G T{iF) is 
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accepted by ^ if t q for some q € Qf. The set of all such terms is denoted 
by L{A). A subset L C T(^) is called regular if there exists a tree automaton 
A — (^, Q,Qf, A) such that L = L{A). It is well-known that every regular lan- 
guage is accepted by a deterministic tree automaton without inaccessible states. 
A deterministic automaton has no two different rules with the same left-hand 
side. A state is inaccessible if no ground term reduces to it. In this paper we 
make use of the additional properties mentioned below. 

Lemma 5. The set of ground instances of a linear term is regular. □ 

We write Sff) for the set of ground instances of the term t. The next result 
states that it is decidable whether a ground instance of an arbitrary term is 
accepted by a given tree automaton. For a linear term t this is obvious since (1) 
S{t) is regular by Lemma 5, (2) regular languages are effectively closed under 
intersection, and (3) emptiness is decidable for regular languages. The point is 
that the problem remains decidable for non-linear terms. This extension will 
turn out to be very important for automatically proving termination of TRSs 
that rely on non-linearity (i.e., by linearizing the rewrite rules the TRS becomes 
non-terminating) . 

Theorem 6 (Tison [26]). The following problem is decidable: 

instance: tree automaton A, term t 

question: S{t) fl L{A) = 0? 

Proof. First we transform A into an equivalent deterministic tree automaton 
B — {iF,Q,Qf, A) without inaccessible states. We claim that S{t) fl L{A) yf 0 
if and only if there exists a mapping cr: Var(t) — >• Q such that ta £ L{B). 

^ Suppose S{f) n L{A) yf 0. So there exists a substitution r: Var(t) -y T{T) 
such that tr £ T(A) = T(B). Hence tr — q for some q £ Q/. In this 
sequence every subterm t{x) of tr is reduced to some state. Because B is 
deterministic, different occurrences of t(x) in tr reduce to the same state, 
say qx £ Q. Define the mapping cr: Var(t) -£ Q hy cr(x) = qx for every 
X € Var(t). Clearly tr — ta — q and hence ta G L{B). 

<^= Suppose ta £ L{B) for some mapping a: Var(t) — >• Q. So ta — q for 
some q £ Qf. Since all states of B are accessible, there exists a substitution 
r: Var(t) — >• T{iF) such that t{x) — a{x) for all x G Var(t). Hence tr 
ta — q and thus tr G L{B) = L{A). In other words, S(t) fl L{A) yf 0. 

Since there are only finitely many mappings from Var(t) to Q, this yields a 
decision procedure. □ 

We stress that for a linear term t there is no need to perform the expensive 
determinization of A. 
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4 Approximations 

In this section we define new approximations of the dependency graph. Our 
approximations are based on the framework of Durand and Middeldorp [11] for 
the study of decidable call-by-need computations in orthogonal term rewriting. 

If TZ is an eTRS over a signature T and L C T{T) then (— >-^)[i] denotes 
the set of all terms s G T{T) such that s — t for some term t G L. 

Definition 7. An approximation mapping is a mapping a from eTRSs to 
eTRSs with the property that -G-ji C —>-*( 7 ^) for every eTRS TZ. In the following 
we write TZa instead of a{TZ). We say that a is regularity preserving */(— >-^^)[A] 
is regular for all eTRSs TZ and regular L. 

In [11] an approximation mapping a is also required to satisfy the condition 
that the ground normal forms of TZ and TZa coincide, but we do not need that 
condition here. Next we define three approximation mappings that are known 
to be regularity preserving. Our definitions are slightly different from the ones 
found in the literature because we have to deal with possibly non-left-linear 
TRSs. 

Definition 8 . Let TZ he an eTRS. The strong approximation TZs is obtained 
from TZ by replacing the right-hand side and all occurrences of variables in 
the left-hand side of every rewrite rule by distinct fresh variables, i.e., TZs = 
{REN(Z) ^ X [ I ^ r G TZ and x ts a fresh variable ^ . The nv approximation TZn^ 
is obtained from TZ by replacing all occurrences of variables in the rewrite rules 
by distinct fresh variables: TZ^y = {REN(I) — >■ REN(r) \ I ^ r G TZ}. An eTRS 
is called growing if for every rewrite rule I -G r the variables in Var(Z) fl Var(r) 
occur at depth \ ini. The growing approximation TZg is defined as any left-linear 
growing eTRS that is obtained from TZ by linearizing the left-hand sides and re- 
naming the variables in the right-hand sides that occur at a depth greater than 
1 in the corresponding left-hand sides. 

For instance, if TZ contains the rewrite rule T{x,g{x),y) -G f(x, a;, g(y)) then 
TZs contains T{x,g{x'),y) — >■ z, TZnv contains f{x,g{x'),y) — >■ f{x” ,x'” ,g{y')), 
and TZg contains f{x,g{x'),y) T{x,x,g{y)) or f{x',g{x),y) -G T {x" , x" , g{y)) . 

(The former is preferred as it is closer to the original rule. The ambiguity in the 
definition of TZg causes no problems in the sequel.) 

Theorem 9. The approximation mappings s, nv, and g are regularity preserv- 
ing. □ 

Nagaya and Toyama [24] proved the above result for the growing approxi- 
mation; the tree automaton that recognizes (— >-^g)[T] is defined as the limit of 
a finite saturation process. This saturation process is similar to the ones defined 
in Comon [7] and Jacquemard [16], but by working exclusively with determinis- 
tic tree automata, non-right-linear rewrite rules can be handled. For the strong 
and nv approximation simpler constructions using ground tree transducers are 
possible (see e.g. Durand and Middeldorp [11]). 
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Recently, Takai et al. [25] introduced the class of left-linear inverse finite 
path overlapping rewrite systems and showed that the preceding theorem is 
true for the corresponding approximation mapping. Growing rewrite systems 
constitute a proper subclass of the class of inverse finite path overlapping rewrite 
systems. Since the definition of this class is rather difficult and the construction 
in the proof of regularity preservingness very complicated, we do not consider 
the inverse finite path overlapping approximation here. We note however that 
our results easily extend. 

Definition 10. Let TZ he a TRS and a an approximation mapping. The nodes 
of the a-approximated dependency graph DGq(T^) are the dependency pairs of 
TZ and there is an arrow from s ^ t to u ^ v if and only if both E{f) fl 
H^J[27(REN(u))] ^ 0 and E{u) n J [r(REN(t))] ^ 0. 

So we draw arrow from s— >-ttoM— >-uifa ground instance of t rewrites 
in TZa to a ground instance of REN(u) and a ground instance of u rewrites in 
{TZ~^)a to a ground instance of REN(t). The reason for having both conditions 
is that (1) for decidability t or u should be made linear and (2) depending on 
a and TZ, TZa may better approximate TZ than (JZ~^)a approximates TZ~^, or 
vice-versa. Also, the more conditions one imposes, the closer one gets to the real 
dependency graph. 

Lemma 11. Let TZ he a TRS and a an approximation mapping. 

1. If a is regularity preserving then DGq(7?.) is computable. 

2 . DG(7^) C DG„(7^). 

Proof. 

1. Let s — >■ t and u ^ v he dependency pairs of TZ. Because REN(m) is a lin- 

ear term, 27(REN('u)) is regular (Lemma 5). Since a is regularity preserving, 
(— >'^^)[A’(REN(u))] is regular. Hence, according to Theorem 6, it is decidable 
whether E{t) intersects with )[T'(REN(m))]. By the same reasoning it 

follows that it is decidable whether E{u) and (— >-*.^_i^^)[T’(REN(t))] inter- 
sect. Hence it is decidable whether there exists an arrow from s — >■ t to u — >■ u 
in DGa(T^). 

2. Suppose there is an arrow from dependency pair s — >■ t to dependency pair 

u — >• u in DG(7?.). So ta — ut for some substitutions cr and r. We may 
assume without loss of generality that ta and ut are ground terms. Hence 
ta G E{t) C L'(REN(t)) and ut G E(u) C T’(REN(u)). Consequently, E(t) fl 
(— >-^)[T’(REN(u))] yf 0 and E(u) fl (— >-^_i)[A'(REN(t))] ^ 0. Because a is 
an approximation mapping, — C and C — . Therefore 

^(i) n (^^_^)[i7(REN(u))] yf 0 and E(u) n J[A'(REN(1t))] yf 0. In 

other words, there exists an arrow from s — >■ t to u — ^ w in DGq(T^). □ 

It should be clear that a better approximation mapping results in a better 
approximation of the dependency graph. Hence we have the following result. 
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Lemma 12. DGg(7?.) C DGnv(^) ^ DGs(7?.) for every TRS TZ. □ 

The reason for considering the strong and nv approximations in this paper is 
that DGg and DG„v are easier to compute than DGg, cf. the paragraph following 
Theorem 9. 

5 Comparison 

In this section we compare our a-approximated dependency graph with the 
estimated dependency graph of Arts and Giesl and the approximation of the 
dependency graph defined by Kusakari and Toyama [19,21]. 

The first two examples show that the s-approximated dependency graph and 
the estimated dependency graph are incomparable in general. 

Example 13. Consider the TRS TZ consisting of the two rewrite rules 

f(g(a)) ^ f(a) 

a b 



There are two dependency pairs: 

F(g(a)) ^ F(a) (1) 

F(g(a)) ^ A (2) 

Because REN(CAP(F(a))) = F(x) unifies with F(g(a)), EDG(T^) contains two 
arrows: 

( 1 ) ^( 2 ) 

We have {TZ~^)s = {f(a) — >■ a;,b — >• x}. Hence (— >-* 7 ^-i)J[{F(a)}] consists of all 
terms of the form f"(a), f”(b), F(f”(a)), F(f"(b)) with n ^ 0. The term F(g(a)) 
clearly does not belong to this set and hence there are no arrows in DGg (7^). 

Example 14- Consider the TRS TZ consisting of the single rewrite rule 

f{x,x) — >■ f(a,b) 



There is one dependency pair: 



F(x,a;) — F(a,b) 

Because REN(GAP(F(a, b))) = F(a,b) and F(a;,x) are not unifiable, EDG(7?.) 
contains no arrows. However, both T’(F(a,b)) fl ) [T'(REN(F(a;, a;)))] and 
E{¥{x,x)) n (— >'*.^_i^J[T'(REN(F(a, b)))j are non-empty, as witnessed by the 
terms F(a,b) and F(f(a, b), f(a, b)). 

The non-left-linearity in the preceding example is essential. This is shown in 
Lemma 16 below. In the proof we make use of the following lemma. Here 
is the inverse of the relation — (which is different from —>-* 7 ^- 1 ) )• 
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Lemma 15. (<— ^Jp(REN(i))] C A(REN(CAP(t))) for every TRS TZ and term t. 

Proof. Let tF be the signature of TZ. We use induction on the structure of t. If t is 
a variable or if the root symbol of t is a defined symbol then CAP(t) is a variable 
and hence i7(REN(CAP(t))) = T{tF) and thus trivially (^^J[A'(REN(t))] C 
A'(REN(CAP(t))). Suppose t = with / a constructor. Because the 

left-hand side of every rule in TZg starts with a defined symbol and the arguments 
of REN(t) do not share variables, )[A'(REN(t))] = {/(si, . . . , s„) | Si G 
I7(REN(t,))}. Also i;(REN(CAP(t))) = {/(si, . . . , s„) | s, g i;(REN(CAP(t,)))}. 
Hence the desired inclusion follows from the induction hypothesis. □ 

The previous lemma does not hold for eTRSs. For instance, consider the 
eTRS TZ = {x ^ St} over the signature consisting of the constants a and b. If 
t = b then (^^J[i;(REN(t))] = {a, b} and i;(REN(CAP(t))) = {b}. 

Lemma 16. IfTZisa left-linear TRS then DGs(JZ) C EDG(T^). 

Proof. Suppose there is an arrow from dependency pair s — >■ t to dependency 
pair u — >■ V in DGs(7?.). By definition, S(t) fl (— >-|jJ[L'(REN('u))] yf 0. Since TZ 
is left-linear, m is a linear term and thus A’(REN(m)) = S{u). Hence there exist 
ground substitutions a and r such that ta — ut. Clearly ta G A'(REN(t)). 
According to the preceding lemma ut G A’(REN(GAP(t))). Since REN(GAP(f)) 
and u do not share variables, they are unifiable and thus there exists an arrow 
from s — >■ t to M — l V in EDG(T^). □ 

Actually, with the strong approximation we can never benefit from non- 
linearity. This is formally expressed in the following lemma. 

Lemma 17. Let TZ he a nonempty eTRS, t a term, and L a set of ground terms. 
The following statements are equivalent: 

1. r(t)n(^^j[L]y^0, 

£ r(REN(t))nH^j[L]y^0. 

Proof. 

Obvious since S{t) C A'(REN(f)). 

<^= Let A be an arbitrary ground redex and define the substitution a = {x 
Z\ I a; G Var(t)}. Because in TZ^ a redex can be rewritten to any term, 
ta — t' for every t' G L7(REN(t)). Hence, if t' G T’(REN(t)) 0 (— 
then ter G A'(f) n □ 

As a consequence, the strong approximation is not all that useful for ap- 
proximating dependency graphs. For the nv approximation matters are quite 
different. Our next result states that the nv-approximated dependency graph is 
always a subgraph of the estimated dependency graph. In order to prove this, 
we need the following preliminary result. 

Lemma 18. (T^nv)”^ = (^~^)nv for every eTRS TZ. 
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Proof. Since TZnv = {REN(^) — ^ REN(r) | / — >• r G TZ}, the result is obvious. □ 

We stress that the above lemma is not true for the strong and growing ap- 
proximations. For the strong approximation the TRSs of Examples 13 and 14 
serve as counterexample. 

Theorem 19. DGnv(i^) C EDG(T^) for every TRS TZ. 

Proof. Suppose there is an arrow from dependency pair s — >■ t to dependency 
pair u — >■ u in DG„v(7^). By definition, S{u) fl (— ) [A'(REN(t))] ^ 0. 
According to Lemmata 18 and 15, and using the observation that — is a 
subrelation of (^*^_,)_^J[i:(REN(t))] = (^|,_^J[L;(REN(t))] C 

[i;(REN(t))] C i7(REN(GAP(t))). Hence r(u)ni:(REN(GAP(t))) 0 and hence 
u and REN(GAP(t)) are unifiable. Therefore the arrow from s — >■ t to m — >■ u also 
exists in EDG(7?.). □ 

The next example shows that the nv-approximated dependency graph is in 
general a proper subgraph of the estimated dependency graph. 

Example 20. Consider the TRS TZ consisting of the two rewrite rules 

f(a,b,a;) — >■ ^{x,x,x) 
a — 5- c 

There is one dependency pair: 

F(a,b,a;) — >■ ^{x,x,x) 

Since REN(GAP(F(x, x, a;))) = F(xi,X2,X3) unifies with F(a,b,x), EDG(7?.) con- 
tains a cycle. We have i7(REN(F(a, b, x))) = {F(a,b,t) | t G T{TF)} and TZ^v = 
{f(a,b,x) — >■ f(xi, X2, X3), a — > c}. Consequently (— >-|j_^^)[A'(REN(F(a, b, x)))] = 
A'(REN(F(a, b, x))) and since no instance of F(x,x,x) belongs to this set, 
DGnv(^) contains no arrow. Therefore TZ is trivially terminating. 

The TRS in the above example is not DP quasi-simply terminating. The 
class of DP quasi-simply terminating TRSs was introduced by Giesl and Ohle- 
busch [14] and supposed to “capture all TRSs where an automated termination 
proof using dependency pairs is potentially feasible”. We note that the various 
refinements of the dependency pair method (narrowing, rewriting, instantiation; 
see Giesl and Arts [12]) are not applicable and moreover that proving innermost 
termination (which is easy with the standard dependency pair technique) is in- 
sufficient for termination as the TRS does not belong to a known class for which 
termination and innermost termination coincide. 

The next example shows a TRS that cannot be proved terminating with the 
nv approximation but whose (automatic) termination proof becomes easy with 
the growing approximation. 
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Example 21. Consider the TRS TZ consisting of the three rewrite rules 

f(x,a) f(a;,g(a;,b)) 

g(h(a;),y) g(a;,h(y)) 

g(a,y) ^ y 

There are three dependency pairs: 



F(x,a) 


F(x,g(a;, b)) 


(1) 


F(x,a) 


— ?■ G(x, b) 


(2) 


G(h(x),y) 


G(x,h(y)) 


(3) 



One easily verifies that DGnv(^) contains two cycles: 

(^(1) ^(2) >(3)^ 

In particular, F(a,g(a,b)) — F(a,a) which explains the arrows from (1) to 
(1) and (2). The problematic cycle {(!)} does not exist in DGg(T^) because no 
ground instance of F(x,g(x, b)) rewrites in TZ^ to a ground instance of F(a;, a): 

(1) (2) >(3)^ 

As a consequence, the resulting ordering constraints (obtained from Theorem 2) 
are easily satisfied (e.g. by taking 7r(f) = 1 in combination with the lexicographic 
path order with precedence G > h and g > h)d 

In the final part of this section we compare our a-approximated dependency 
graph with the approximation of the dependency graph defined by Kusakari and 
Toyama [19,21]. Their approximation relies on the concepts of w-reduction and 
12-reduction. The first concept stems from Huet and Levy [15]. 

Let 7^ be a TRS over a signature T . Let 17 be a fresh constant. The set of 
ground terms over the extended signature {17} is denoted by Tn{T). Given 
a term t G T{T, V), the term in Tq(T) obtained from t by replacing all variables 
by 17 is denoted by The prefix order ^ on Tn{T) is defined by the following 
two clauses: 

— t ^ 17 for every t G Tn{T), 

- /(si,...,s„) ^ /(ti,...,l„) if Si ^ ti for every 1 < z < n. 

Two terms s,t G Tq{E) are compatible, denoted by s f G if there exists a 
term u G Tq{T) such that both u ^ s and u ^ t. Finally, w-reduction is the 
relation on Tn{lF) defined as follows: s t if and only if s = C[s'] and 
t = G[17] such that 17 s' f for some / — >■ r G 7^. It is easy to prove that 

^ Again, the TRS is not DP quasi-simply terminating. Unlike the previous example, 
proving innermost termination is sufficient for termination, but the estimated in- 
nermost dependency graph coincides with EDG(7?.) = DGnv(7?.) and the narrowing 
refinement for innermost termination fails to make the requirements for an automatic 
proof easier. 




604 A. Middeldorp 



w-reduction is terminating and confluent. Hence every term t G 7h{^) has a 
unique normal form, which is denoted by w(t). It is well-known that w-reduction 
is closely related to the strong approximation. Below we make use of the following 
well-known facts (for all terms s,t G Tn{T))'. 

- to{t) < t, 

— if s ^ t then uj(s) ^ oj(t). 

The concept of f?-reduction corresponds to the nv approximation and is defined 
as follows: s i if and only if s = C[s'] and t = C[rj 7 ] for some I ^ r G TZ 
such that 12 yf s' t ^ 14 - Unlike w-reduction, f?-reduction is in general neither 
confluent nor terminating. 

Lemma 22. Let TZ be a TRS. If s t and s' ^ s then s' — t' for some 

t' < t. 

Proof. Induction on the length of s — t, using the easy to prove fact that if 
s — t and s' ^ s then s' -Gq t' for some t' ^ t. □ 

We now have all ingredients to define Kusakari and Toyama’s approximation 
of the dependency graph. Actually, their definition applies to AC rewriting, an 
extension that we do not consider in this paper. The definition below is the 
specialization to ordinary term rewriting. 

Definition 23. Let TZ he a TRS. For every 0 we define the graph DG^(7?.) 
as follows. Its nodes are the dependency pairs of TZ and there is an arrow from 
s ^ t to u ^ V if and only if there exists a term t' G Tn{fF) such that t' f uo and 
either to -g'q t' with m < n or to t' . (Note that the latter condition 

is equivalent to to -g'q t" and t' = oj{t") for some term t" G To{fF)-) 

Lemma 24 (Kusakari and Toyama [19,21]). Let TZ he a TRS and n ^ 0. 

1. DGf^{TZ) is computable. 

2. DGIItZ) C DG{TZ). □ 

It is not difficult to show that EDG(7?.) and DGf^iTZ) are incomparable in 
general, for all n ^ 0 (contradicting the remark in Kusakari and Toyama [21] 
that their algorithm for approximating the dependency graph is more powerful 
than the one of Arts and Giesl). For instance, for the TRS TZ of Example 14 
DG^(T^) contains a cycle for every n ^ 0 whereas EDG(T^) is empty. The same 
holds for DGs(T^). However, it is easy to prove that DGs(T^) is always a sub- 
graph of DG^(7?.) and sometimes a proper subgraph, like in Example 13 where 
DG^oi'^) coincides with EDG(T^). Below we compare Kusakari and Toyama’s 
approximation with our nv-approximated dependency graph. 



Lemma 25. Let TZ he a TRS. If n > 0 then DG^iTZ) C DG(^ ^(’^)- 
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Proof. Suppose there is an arrow from dependency pair s — >■ t to dependency 
pair u — >• u in DG^^(T^). So there exists a term t' G Tn{T) such that t' f uq and 
either ta m < n or to t' . First suppose that Iq 

with TO<n. If m<n— 1 then the arrow from s — >■ t to u — >■ u also exists in 
DG”“^(7^). If m = n — 1 then we reason as follows. Since uj{t') < t', f uq. 
Clearly Iq t' — Hence the arrow from s — >• t to m — >■ v exists in 

DG"“^(7^). Finally consider the case that ta ■ So there exists terms 

t\ and t 2 such that to h -^o ^2 — >-L t'- Clearly ti t '2 with <2 ^ h- 

We have uj{ti) = ^ = t' ■ Hence w(ti) and uq are compatible. Since 

to h oj(ti), the arrow from s t to u v exists in DG^“^(7?.). □ 

The following result is the key to showing that our nv-approximated depen- 
dency graph is a subgraph of DG^(T^), for all n ^ 0. 

Lemma 26. Let s ^ t and u ^ v be dependency pairs ofTZ. If A'(REN(t)) fi 
(— i'^^^)[A'(REN(t6))] yf 0 then there is an arrow from s ^ t to u ^ v in DG^(T^) 
for all 0. 

Proof Suppose i:(REN(t)) n )[A'(REN(u))] yf 0. So REN(t)a 

REN {u)t for some ground substitutions a and r. Since to K REN(t)cr, an appli- 
cation of Lemma 22 yields to ~^*o u' for some term u' ^ REN(u)r. Because also 
uo ^ REN(m)t, u' f uo- Let m be the length of the 17-reduction sequence from 
to to u'. It follows that there is an arrow from s — >■ t to m — >■ v in DG5^(7^) for all 
n> m. According to Lemma 25, the arrow exists also in DG^(T^) for n ^ m. □ 



Theorem 27. DGnv(^) ^ DG^(T^) for every TRS TZ and n ^ 0. 

Proof. Suppose there is an arrow from dependency pair s — >■ t to dependency 
pair M — >■ u in DGnv(’T^)- By definition, E(t) fl )[A'(REN(m))] yf 0. Since 

E{t) C L;(REN(t)), also r(REN(t)) n (^^_^J[A'(REN(u))] yf 0. According to 
Lemma 26 the arrow from s — >■ t to m — >■ u exists in DG5^(7?.) for all n ^ 0. □ 

The reverse inclusion does not hold. Consider for instance the TRS TZ of 
Example 20. Since F(l7, 17, 17) is compatible with F(a, b, 17), DG)^(7^) contains a 
cycle for all n ^ 0. 

In retrospect, Kusakari and Toyama’s approximation suffers from the fol- 
lowing two problems: (1) since all variables are replaced by 17, TRSs that are 
terminating because of non-linearity cannot be handled appropriately, and (2) 
there is no need to bound the number of 17-reduction steps, rather, by avoiding 
such a bound we can make effective use of tree automata techniques. 

6 Decidable Classes 

Termination is known to be decidable for several subclasses of TRSs. In this 
section we investigate whether these decidability results can be obtained with 
the dependency pair technique. The best known class of TRSs with a decidable 
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termination problem is the class of right-ground TRSs (Dershowitz [9]). The 
following easy result states that in principle the dependency pair technique is 
very suitable for deciding termination of right-ground TRSs. 

Theorem 28. A right-ground TRS TZ is terminating if and only if DG{TZ) con- 
tains no cycles. 

Proof. 

Suppose DG(T^) contains a cycle C = {s^ — >■ | 1 < i < n}. We show that TZ 

is non-terminating. Without loss of generality we assume that C is minimal. 
Since TZ is right-ground, there exist substitutions (Xj such that ti — Si+iat 
for all 1 ^ i ^ n and with s„+i = si. By definition of dependency pairs, for 
every 1 ^ i ^ n there exists a rewrite rule k ^ rt G TZ and a subterm Ui of 
ri such that Sj = if and ti = uf. Let Ci be the context such that = Ci[ui]. 
Since all steps in uf — take place below the root position, we also 
have Ui ^i+icr* and thus = C^[u^] Ci[k+iai] (with l^+i = h). 
Therefore 



h fi — C'i[^20'i] -^Tz C'i[r2] — C'i[C'2[?3cr2]] 

~^n Ci[C2[’ ■ ■ • • • ]] 

which gives rise to an infinite rewrite sequence. 

<^= If there are no cycles in DG(T^) then the conditions of Theorem 2 are trivially 
satisfied and thus TZ is terminating. □ 

So as far as termination of right-ground TRSs is concerned, the only thing 
that matters is a good approximation of the dependency graph. Next we consider 
how the various approximations of the dependency graph deal with right-ground 
TRSs. 

Theorem 29. For every left-linear right-ground TRS TZ, DG(T^) = DGnv(’T^)- 

Proof. According to Lemma 11 it suffices to show that DGnv(^) ^ DG(T^). So 
suppose there is an arrow from dependency pair s — >■ t to dependency pair u ^ v 
in DGnv(’T^)- Hence i7(t)n(— >-^^^)[T'(REN( m))] 0. Because TZ is left-linear and 

right-ground, TZnv = TZ,t\sa, ground term, and u is linear. Hence t G (^^mu)] 
and thus t ua for some ground substitution cr. Therefore the arrow from 
s — >■ t to w — >■ u also exists in DG(7?.). □ 

The following example shows that without the left-linearity condition DG(T^) 
and DGnv(^) may differ. 

Example 30. Consider the right-ground TRS TZ consisting of the three rewrite 
rules 



f(a) ^ g(h(a,b)) 

g(g(a)) ^ f(b) 

h(x,a;) -)► g(a) 
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There are four dependency pairs: 

F(a) ^ G(h(a,b)) (1) G(g(a)) ^ F(b) (3) 

F(a) ^ H(a,b) (2) H(x,a;) ^ G(a) (4) 

Because T^nv contains the rewrite rule h(a:,j/) — >■ g(a) and (T^nv)”^ = (^~^)nv, 
DGnv(’T^) contains an arrow from (1) to (3). However, since G(h(a,b)) does not 
rewrite to G(g(a)) in TZ, this arrow does not exist in DG(7?.). 

Note that in the previous example TZg also contains the rule h(x,y) — >■ g(a) 
but the corresponding rule in (7?.“^)g is g(a) — >• h(a;,x) and therefore G(g(a)) 
does not belong to (— >-*^„i^^)[{G(h(a, b))}]. Hence there is no arrow from (1) to 
(3) in DGg(T^). This holds in general. 

Theorem 31. For every right-ground TRS TZ, DG(7?.) = DGg(7?.). 

Proof. According to Lemma 11 it suffices to show that DGg(T^) C DG(7?.). So 
suppose there is an arrow from dependency pair s — >■ t to dependency pair u ^ v 
in DGg(T^). Hence i7(M)n(— [A'(REN(t))] ^ 0. Because 7^ is right-ground, 
(7^“^)g = TZ~^ and t is a ground term. Hence S{u) fl (^^)[{t}] yf 0 and thus 
t — u(T for some ground substitution cr. Therefore the arrow from s — >■ t to 
u ^ V also exists in DG(7?.). □ 

The above results provide an easy decision procedure for termination of right- 
ground TRSs TZ: Compute the dependency graph of TZ using the growing (nv, if 
TZ is left-linear) approximation and determine whether there are any cycles. We 
stress that the above results are not true for the estimated dependency graph. 
Recently, Nagaya and Toyama [24] obtained the following decidability result. 

Theorem 32. Termination is decidable for almost orthogonal growing TRSs. 

□ 

It should be noted that this result does not cover the preceding results due 
to the almost orthogonality requirement. (A TRS is called almost orthogonal 
if it is left-linear and all critical pairs are trivial overlays.) On the other hand, 
although it is very easy to prove that DG(7?.) = DGg(T^) for every left-linear 
growing TRS TZ, the dependency pair approach does not seem to give an easy 
decision procedure since the dependency graph may contain cycles, as shown in 
the following example. 

Example 33. Consider the (almost) orthogonal growing TRS TZ consisting of the 
two rewrite rules 

f(a;) -T g{x) 
g(a) -T f(b) 

There are two dependency pairs: 

F(a;) ^ G{x) (1) 

G(a) ^ F(b) (2) 
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One easily verifies that DG(T^) contains a cycle: 

( 1 ) ( 2 ) 

However, TZ is clearly terminating (and it is very easy to solve the constraints 
stemming from the dependency pair technique in Theorem 2). 

7 Conclusion 

In this paper we have shown that simple tree automata techniques are useful 
to obtain better approximations of the dependency graph and hence we can 
automatically prove termination of a larger class of TRSs. More sophisticated 
tree automata techniques have been developed for dealing with non-linearity, 
see [8, Chapter 4], but we are not aware of any preservation results for the 
corresponding language classes and hence it is unclear whether these techniques 
could further improve automatic termination techniques. 

Obviously, our a-approximated dependency graphs are harder to compute 
than the estimated dependency graph of Arts and Giesl. Consequently, we do 
not propose to eliminate the estimated dependency graph. Rather, our approxi- 
mations should be tried only if tools based on the estimated dependency graph 
(like [1]) fail to prove termination or maybe in parallel to the search for suitable 
argument filterings and orderings to satisfy the resulting constraints. Clearly 
experimentation is needed to determine when to invoke our approximations. 
Currently we are working on an implementation of our algorithms. 

It is worthwhile to investigate whether our approach can be extended to AC 
termination ([21,23]) and to innermost termination ([4]). For AC termination 
we do not expect any problems, but innermost termination seems more difficult. 
The reason is that the existence of an arrow from s — >■ f to m — >■ u in the 
innermost dependency graph does not only depend on whether a ground instance 
ta of t innermost rewrites to a ground instance ut of u, but scr and ut are 
additionally required to be normal forms. The latter condition is easily verified by 
tree automata techniques but it is unclear how to deal with the synchronization 
between the two conditions. 

Since there are numerous examples of terminating TRSs whose dependency 
graphs do contain cycles, it goes without saying that the work reported in this 
paper is not the final answer to the problem of proving termination of rewrite 
systems automatically. 

Acknowledgements. I thank Seitaro Yuuki for useful discussions. The paper 
benefitted from detailed comments of Jurgen Giesl and the anonymous referees. 
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Abstract. This paper considers finite-automata based algorithms for 
handling linear arithmetic with both real and integer variables. Previous 
work has shown that this theory can be dealt with by using finite au- 
tomata on infinite words, but this involves some difficult and delicate to 
implement algorithms. The contribution of this paper is to show, using 
topological arguments, that only a restricted class of automata on infinite 
words are necessary for handling real and integer linear arithmetic. This 
allows the use of substantially simpler algorithms and opens the path 
to the implementation of a usable system for handling this combined 
theory. 



1 Introduction 

Among the techniques used to develop algorithms for deciding or checking logi- 
cal formulas, finite automata have played an important role in a variety of cases. 
Classical examples are the use of infinite-word finite automata by Biichi [Biic62] 
for obtaining decision procedures for the first and second-order monadic theories 
of one successor as well as the use of tree automata by Rabin [Rab69] for decid- 
ing the second-order monadic theory of n successors. More recent examples are 
the use of automata for obtaining decision and model-checking procedures for 
temporal and modal logics [VW86a,VW86b,VW94,KVW00]. In this last setting, 
automata-based procedures have the advantage of moving the combinatorial as- 
pects of the procedures to the context of automata, which are simple graph-like 
structures well adapted to algorithmic development. This separation of concerns 
between the logical and the algorithmic has been quite fruitful for instance in 
the implementation of model checkers for linear-time temporal logic [CVWY90, 
Hol97]. 

As already noticed by Biichi [Biic60,Buc62], automata-based approaches are 
not limited to sequential and modal logics, but can also be used for Presburger 
arithmetic. To achieve this, one adopts the usual encoding of integers in a base 

* This work was partially funded by a grant of the “Communaute frangaise de Belgique 
- Direction de la recherche scientifique - Actions de recherche concertees” . 
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r > 2, thus representing an integer as a word over the alphabet {0 , . . . ,r — 1}. 
By extension, n-component integer vectors are represented by words over the 
alphabet 1}” and a finite automaton operating over this alphabet 

represents a set of integer vectors. Given that addition and order are easily rep- 
resented by finite automata and that these automata are closed under Boolean 
operations as well as projection, one easily obtains a decision procedure for Pres- 
burger arithmetic. This idea was first explored at the theoretical level, yielding 
for instance the very nice result that base-independent finite-automaton repre- 
sentable sets are exactly the Presburger sets [Cob69,Sem77,BHMV94]. Later, it 
has been proposed as a practical means of deciding and manipulating Presburger 
formulas [BC96,Boi98,SKR98,WB00]. The intuition behind this applied use of 
automata for Presburger arithmetic is that finite automata play with respect 
to Presburger arithmetic a role similar to the one of Binary Decision Diagrams 
(BDDs) with respect to Boolean logic. These ideas have been implemented in the 
LASH tool [LASH], which has been used successfully in the context of verifying 
systems with unbounded integer variables. 

It almost immediately comes to mind that if a finite word over the alphabet 
{0, . . . , r — 1} can represent an integer, an infinite word over the same alphabet 
extended with a fractional part separator (the usual dot) can represent a real 
number. Finite automata on infinite words can thus represent sets of real vec- 
tors, and serve as a means of obtaining a decision procedure for real additive 
arithmetic. Furthermore, since numbers with empty fractional parts can easily 
be recognized by automata, the same technique can be used to obtain a decision 
procedure for a theory combining the integers and the reals. This is not presently 
handled by any tool, but can be of practical use, for instance in the verification 
of timed systems using integer variables [BBR97]. However, turning this into 
an effective implemented system is not as easy as it might first seem. Indeed, 
projecting and complementing finite automata on infinite words is significantly 
more difficult than for automata on finite words. Projection yields nondetermin- 
istic automata and complementing or determinizing infinite-word automata is a 
notoriously difficult problem. A number of algorithms have been proposed for 
this [Biic62,SVW87,Saf88,KV97], but even though their theoretical complexity 
remains simply exponential as in the finite-word case, it moves up from 2*^*^”^ 
to and none of the proposed algorithms are as easy to implement and 

fine-tune as the simple Rabin-Scott subset construction used in the finite-word 
case. 

However, it is intuitively surprising that handling reals is so much more 
difficult than handling integers, especially in light of the fact that the usual 
polyhedra-based approach to handling arithmetic is both of lower complexity 
and easier to implement for the reals than for the integers [FR79]. One would 
expect that handling reals with automata should be no more difficult than han- 
dling integers^. The conclusion that comes out of these observations is that 

^ Note that one cannot expect reals to be easier to handle with automata than integers 
since, by nature, this representation includes explicit information about the existence 
of integer values satisfying the represented formula. 




On the Use of Weak Automata 613 



infinite-word automata constructed from linear arithmetic formulas must have a 
special structure that makes them easier to manipulate than general automata 
on infinite words. That this special structure exists and that it can exploited to 
obtain simpler algorithms is precisely the subject of this paper. 

As a starting point, let us look at the topological characterization of the sets 
definable by linear arithmetic formulas. Let us first consider a formula involving 
solely real variables. If the formula is quantifier free, it is a Boolean combination 
of linear constraints and thus defines a set which is a finite Boolean combina- 
tion of open and closed sets. Now, since real linear arithmetic admits quantifier 
elimination, the same property also holds for quantified formulas. Then, looking 
at classes of automata on infinite words, one notices that the most restricted 
one that can accept Boolean combinations of open and closed sets is the class 
of deterministic weak automata [SW74,Sta83]. These accept all w-regular sets 
in the Borel class fl Gs and hence also finite Boolean combinations of open 
and closed sets. So, with some care about moving from the topology on vec- 
tors to the topology on their encoding as words, one can conclude that the sets 
representable by arithmetic formulas involving only real variables can always be 
accepted by deterministic weak automata on infinite words. If integers are also 
involved in the formula, there is no established quantifier elimination result for 
the combined theory and one cannot readily conclude the same. A first result 
in this paper closes this loophole. It establishes that sets definable by quantified 
linear arithmetic formulas involving both real and integer variables are within 
F„C\Gs and thus are representable by deterministic weak automata. Rather than 
using a quantifier elimination type argument to establish this, our proof relies 
on separating the integer and fractional parts of variables and on topological 
properties of F„C\Gs- 

The problematic part of the operations on automata needed to decide a 
first-order theory is the sequence of projections and complementations needed 
to eliminate a string of quantifiers alternating between existential and universal 
ones. The second result of this paper shows that for sets defined in linear arith- 
metic this can be done with constructions that are simple adaptations of the ones 
used for automata on finite words. Indeed, deterministic weak automata can be 
viewed as either Biichi or co-Biichi automata. The interesting fact is that co- 
Biichi automata can be determinized by the “breakpoint” construction [MH84, 
KV97], which basically amounts to a product of subset constructions. Thus, one 
has a simple construction to project and determinize a weak automaton, yielding 
a deterministic co-Biichi automaton, which is easily complemented into a deter- 
ministic Biichi automaton. In the general case, another round of projection will 
lead to a nondeterministic Biichi automaton, for which a general determiniza- 
tion procedure has to be used. However, we have the result that for automata 
obtained from linear arithmetic formulas, the represented sets stay within those 
accepted by deterministic weak automata. We prove that this implies that the 
automata obtained after determinization will always be weak. 

Note that this cannot be directly concluded from the fact that the repre- 
sented sets stay within those representable by deterministic weak automata. 
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Indeed, even though the represented sets can be accepted by deterministic weak 
automata, the automata that are obtained by the determinization procedure 
might not have this form. Fortunately, we can prove that this is impossible. For 
this, we go back to the link between automata and the topology of the sets of 
infinite words they accept. The argument is that w-regular sets in n Gs have 
a topological property that forces the automata accepting them to be inherently 
weak, i.e. not to have strongly connected components containing both accepting 
and non accepting cycles. 

As a consequence of our results, we obtain a decision procedure for the theory 
combining integer and real linear arithmetic that is suitable for implementation. 
The fact that this theory is decidable was known [BBR97], but the results of this 
paper move us much closer to an implemented tool that can handle it effectively. 

2 Automata-Theoretic and Topological Background 

In this section we recall some automata-theoretic and topological concepts that 
are used in the paper. 

2.1 Automata on Infinite Words 

An infinite word (or w-word) w over an alphabet A is a mapping ru : N — >■ A from 
the natural numbers to A. A Biichi automaton on infinite words is a five-tuple 
A = {Q, A, 6, qo, F), where 

— (5 is a finite set of states; 

— A is the input alphabet; 

— (5 is the transition function and is of the form 5 : Q x A — >■ 2*^ if the automaton 
is nondeterministic and of the form S : Q x S ^ Q ii the automaton is 
deterministic; 

— <7o is the initial state; 

— F is a set of accepting states. 

A run 7T of a Biichi automaton A = {Q, F, S,qo, F) on an w-word w is a 
mapping tt : N — >■ Q that satisfies the following conditions: 

— 7t(0) = qo, i.e. the run starts in the initial state; 

— For all i > 0, 7r(i -|- 1) S (nondeterministic automata) or 

7r(i -I- 1) = S(7r(i),w(i)) (deterministic automata), i.e. the run respects the 
transition function. 

Let inf{Tr) be the set of states that occur infinitely often in a run tt. A run 
7T is said to be accepting if m/(7r) fl F yf 0. An w-word w is accepted by a Biichi 
automaton if that automaton has some accepting run on w. The language L,^(A) 
of infinite words defined by a Biichi automaton A is the set of w-words it accepts. 

A co-Biichi automaton is defined exactly as a Biichi automaton except that 
its accepting runs are those for which m/(7r) fl F = 0. 

We will also use the notion of weak automata [MSS86] . For a Biichi automaton 
A = {Q, A, S, qo, F) to be weak, there has to be a partition of its state set Q into 
disjoint subsets Qi,. . . , Qm such that 
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— for each of the Qi either Qi C F or Qi fl F = 0, and 

— there is a partial order < on the sets Qi,. . . , Qm such that for every q £ Qi 
and q' G Qj for which, for some a £ E, q' £ S{q, a) {q' = 5{q, a) in the 
deterministic case), Qj < Qi. 

For more details, a survey of automata on infinite words can be found in 
[Tho90]. 

2.2 Topology 

Given a set S, a distance d{x, y) defined on this set induces a topology on subsets 
of S. A neighborhood N^{x) of a point a; G S' is the set N^{x) = {y \ d{x, y) < e}. 
A set C C S is said to be open if for all x £ C, there exists e > 0 such that the 
neighborhood Ng{x) is contained in C. A closed set is a set whose complement 
with respect to S is open. We will be referring to the first few levels of the Borel 
hierarchy which are shown in Figure 1. The notations used are the following: 

— F are the closed sets, 

— G are the open sets, 

— F^ is the class of countable unions of closed sets, 

— Gs is the class of countable intersections of open sets, 

— F„s is the class of countable intersections of F^ sets, 

— Gsa is the class of countable unions of Gs sets, 

— B{X) represents the finite Boolean combinations of sets in X. 

An arrow between classes indicates proper inclusion. 

3 Topological Characterization of Arithmetic Sets 

We consider the theory (R, Z, +, <), where + represents the predicate x + y = z. 
Since any linear equality or order constraint can be encoded into this theory, we 
refer to it as additive or linear arithmetic over the reals and integers. It is the 
extension of Presburger arithmetic that includes both real and integer variables. 
In this section, we prove that the sets representable in this theory belong to the 
topological class F^ H Gs defined relatively to the Euclidean distance between 
vectors. This result is formalized by the following theorem. 

Theorem 1. Let S C R", with n > 0, be a set defined in the theory (R, Z, +, <). 
This set belongs to the topological class F„ fl Gs induced by the distance 

d{x,y) = ( - yi)"^ 

Proof. Since (R, Z, +,<) is closed under negation, it is actually sufficient to 
show that each formula of this theory defines a set that belongs to F^, i.e., a set 
that can be expressed as a countable union of closed sets. 
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Fig. 1. The first few levels of the Borel hierarchy. 



Let be a formula of (K, Z, +, <). To simplify our argument, we will assume 
that all free variables of (/? are reals. This can be done without loss of generality 
since quantified variables can range over both M and Z. We introduce u < v as 
a shorthand for m < u A ~i{u = v). 

The first step of our proof consists of modifying tp in the following way. We 
replace each variable x that appears in ip by two variables xi and x f representing 
respectively the integer and the fractional part of x. Formally, this operation 
replaces each occurrence in (p of a free variable x by the sum xj + xp while 
adding to p the constraints 0 < xp and < 1, and transforms the quantified 
variables of p according to the following rules: 

(3x € M)(/> — > (3x/ € Z)(3xp € K.)(0 < xp A xp < 1 A (plx/xj + xp]) 

(Vx G M)(/) — (Vx/ G Z)(Vxf G R){xp < 0 V 1 < XiT- V 4>[x/xi + xf]) 

{Qx G Z)(/> — {Qxi G Z)(f>[x/xi], 
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where Q £ {3,V}, (j) is a subformula, and (j)[x/y] denotes the result of replacing 
by y each occurrence of x in (f>. The transformation has no influence on the set 
represented by tp, except that the integer and fractional parts of each value are 
now represented by two distinct variables. 

Now, the atomic formulas of are of the form p = q + r, p = q or p<q, 
where p, q and r are either integer variables, sums of an integer and of a fractional 
variable, or integer constants. The second step consists of expanding these atomic 
formulas so as to send into distinct atoms the occurrences of the integer and of the 
fractional variables. This is easily done with the help of simple arithmetic rules, 
for the truth value of the atomic formulas that involve both types of variables 
has only to be preserved for values of the fractional variables that belong to the 
interval [0, 1[. The less trivial expansion rules^ are given below: 

(x/ + xp) = {yi + yp) — xi = yi Axp = yp 

(xi + xp) < {yi + yp) — XI <yiV (xj = yi Axp < yp) 

(xi + Xp) = {yi + yp) + {zi + zp) — (x/ = yj + zi A xp = yp + zp) 

V {xi = yi + zi + l Axp = yp + Zp -1) 
(x/ + Xp) = {yi + yp) + zi — XI = yi + zi Axp = yp 

Xi = {yi + yp) + {zi + zp) — (x/ = yi + zi Ayp + zp = 0) 

V {xi = yi + zi + 1 A yp + Zp = 1) 

After the transformation, each atomic formula of p is either a formula (j)i 
involving only integer variables or a formula (j)p over fractional variables. We 
now distribute existential (resp. universal) quantifiers over disjunctions (resp. 
conjunctions), after rewriting their argument into disjunctive (resp. conjunctive) 
normal form, and then apply the simplification rules 

{Qxi G Z){4>ia4>p) — )> {Qxi G 'Z){4>i) a 4>p 
{Qxp G R){4>ia4>p) — ;> 4>i a {Qxp G R)((()_f), 

where Q G {3,V} and a G {V,A}. 

Repeating this operation, we eventually get a formula tp that takes the form 
of a finite Boolean combination . . . , , . . . , of 

subformulas and (j)^p that involve respectively only integer and fractional 
variables. 

Let Xj^\ Xj^\ . . . , Xj^^ be the free integer variables of tp. For each assignment 

(f) 

of values to these variables, the subformulas </>} ' are each identically true or 
false, hence we have 

Each subformula belongs to the theory (M, +,<,!), which admits the 
elimination of quantifiers [FR79]. The sets of reals vectors satisfying these for- 
mulas are thus finite Boolean combinations of linear constraints with open or 

^ In these rules, the expression p = g-l-r-l-s is introduced as a shorthand for (3u)(u = 
q + r A p = u + s), where the quantifier is defined over the appropriate domain. 
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closed boundaries. It follows that, for each (ai, . . . ,Ofc) G the set described 
by ^ • ■ • ; IS & finite Boolean combination of open and closed 

sets and, since any open set is a countable union of closed sets, is within F^- 
Therefore, the set described by is a countable union of Fa- sets and is also 
within Fa- 



4 Representing Sets of Integers and Reals with Finite 
Automata 

In this section, we recall the finite-state representation of sets of real vectors as 
introduced in [BBR97]. 

In order to make a finite automaton recognize numbers, one needs to establish 
a mapping between these and words. Our encoding scheme corresponds to the 
usual notation for reals and relies on an arbitrary integer base r > 1. We encode 
a number x in base r, most significant digit first, by words of the form wj 
where wj encodes the integer part xj of a; as a finite word over {0, . . . , r — 1}, 
the special symbol is a separator, and wp encodes the fractional part xf of 
x as an infinite word over {0, . . . ,r — 1}. Negative numbers are represented by 
their r’s complement. The length p of |ru/|, which we refer to as the integer-part 
length of w, is not fixed but must be large enough for — < a;/ < to 
hold. 

According to this scheme, each number has an infinite number of encodings, 
since their integer-part length can be increased unboundedly. In addition, the 
rational numbers whose denominator has only prime factors that are also factors 
of r have two distinct encodings with the same integer-part length. For example, 
in base 10, the number 11/2 has the encodings 005*5(0)“ and 005*4(9)“, 
denoting infinite repetition. 

To encode a vector of real numbers, we represent each of its components 
by words of identical integer-part length. This length can be chosen arbitrarily, 
provided that it is sufficient for encoding the vector component with the highest 
magnitude. An encoding of a vector x G M” can indifferently be viewed either as 
a n-tuple of words of identical integer-part length over the alphabet {0,...,r — 
1, *}, or as a single word w over the alphabet {0, . . . , r — 1}" U {*}. 

Since a real vector has an infinite number of possible encodings, we have to 
choose which of these the automata will recognize. A natural choice is to accept 
all encodings. This leads to the following definition. 

Definition 1. Let n > 0 and r > 1 be integers. A Real Vector Automaton 
(RVA) A in base r for vectors in M" is a Biichi automaton over the alphabet 
{0, . . . , r — 1}" U {*}, such that 

— Every word accepted by A is an encoding in base r of a vector in K.", and 

— For every vector x G M”, A accepts either all the encodings of x in base r, 

or none of them. 
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An RVA is said to represent the set of vectors encoded by the words that 
belong to its accepted language. Efficient algorithms have been developed for 
constructing RVA representing the sets of solutions of systems of linear equations 
and inequations [BRW98]. Since it is immediate to constrain a number to be 
an integer with an RVA and since, using existing algorithms for infinite- word 
automata, one can apply Boolean operations as well as projection to RVA, it 
follows that one can construct an RVA for any formula of the arithmetic theory 
we are considering. 



5 Weak Automata and Their Properties 

If one examines the constructions given in [BRW98] to build RVA for linear 
equations and inequations, one notices that they have the property that all 
states within the same strongly connected component are either accepting or 
nonaccepting. This implies that these automata are weak in the sense of [MSS86] 
(see Section 2). 

Weak automata have a number of interesting properties. A first one is that 
they can be represented both as Biichi and co-Biichi. Indeed, a weak Biichi 
automaton A = (Q, S,S,qQ, F) is equivalent to the co-Biichi automaton A = 
{Q, S,6,qo,Q \ F), since a run eventually remains within a single component 
Qi in which all states have the same status with respect to being accepting. A 
consequence of this is that weak automata can be determinized by the fairly 
simple “breakpoint” construction [MH84,KV97] that can be used for co-Biichi 
automata. This construction is the following. 

Let A = (Q, V, 6, qo, F) be a nondeterministic co-Biichi automaton. The de- 
terministic co-Biichi automaton A' = {Q' , F, 6', < 7 q, F') defined as follows accepts 
the same w-language. 

- Q' = 2'^ X 2^ , the states of A' are pairs of sets of states of A. 

- <7o = ({ 90 }, 0). 

~ For (S,R) £ Q' and a £ F, the transition function is defined by 

— if i? = 0, then 6{{S, R),a) = (T, T \ F) where T = {q \ 3p £ S and q £ 
S(p, a)}, T is obtained from S' as in the classical subset construction, and 
the second component of the pair of sets of states is obtained from T by 
eliminating states in F; 

— if i? yf 0, then i5((S, R),a) = (T, U \ F) where T = {q \3p £ S and q £ 
S(p,a)}, and U = {q j 3p £ R and q G S(p,a)j, the subset construction 
set is now applied to both S and R and states in F are removed from U. 

- F' = 2^ X {0}. 

When the automaton A' is in a state (S, i?), R represents the states of A that 
can be reached by a run that has not gone through a state in F since that last 
“breakpoint”, i.e. state of the form {S, 0). So, for a given word, A has a run that 
does not go infinitely often through a state in F iff A! has a run that does not 
go infinitely often through a state in F' . Notice that the difficulty that exists 
for determinizing Biichi automata, which is to make sure that the same run 
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repeatedly reaches an accepting state disappears since, for co-Biichi automata, 
we are just looking for a run that eventually avoids accepting states. 

It is interesting to notice that the construction implies that all reachable 
states (S,R) of A' satisfy R C S. The breakpoint construction can thus be 
implemented as a subset construction in which the states in R are simply tagged. 
One can thus expect it to behave in practice very similarly to the traditional 
subset construction for finite-word automata. 

Another property of weak automata that will be of particular interest to us 
is the topological characterization of the sets of words that they can accept. 
Consider the topology on the set of w-words induced by the distance 



d{w, w' 




\common{w 

0 



ii w ^ w' 
ii w = w' , 



where \common{w,w')\ denotes the length of the longest common prefix of w 
and w' . In this topology, weak deterministic automata accept exactly the w- 
regular languages that are in Gs- This follows from the results on the 
Staiger- Wagner class of automata [SW74,Sta83], which coincides with the class 
of deterministic weak automata, as can be inferred from [SW74] and is shown 
explicitly in [MS97]. Given the result proved in Section 3, it is tempting to 
conclude that the encodings of sets definable in the theory (R, Z, -I-, <) can always 
be accepted by weak deterministic automata. This conclusion is correct, but 
requires shifting the result from the topology on numbers to the topology on 
words, which we will do in the next section. In the meantime, we need one more 
result in order to be able to benefit algorithmically from the fact that we are 
dealing with F„ fl Gs sets, i.e. that any deterministic automaton accepting a 
F„ n Gs set is essentially a weak automaton. 

Consider the following definition. 

Definition 2 . A Biichi automaton is inherently weak if none of the reachable 
strongly connected components of its transition graph contains both accepting 
(including at least one accepting state) and non accepting (not including any 
accepting state) cycles. 

Clearly, if an automaton is inherently weak, it can directly be transformed into 
a weak automaton. The partition of the state set is its partition into strongly 
connected components and all the states of a component are made accepting or 
not, depending on whether the cycles in that component are accepting or not. 

We will now prove the following. 

Theorem 2 . Any deterministic Biichi automaton that accepts a language in 
Fa n Gs is inherently weak. 

To prove this, we use the fact that the language accepted by an automaton 
that is not inherently weak must have the following dense oscillating sequence 
property. 



Definition 3 . A language L C has the dense oscillating sequence prop- 
erty if, wi,W2,W3, . . . being words and £i, £2, £3, • • ■ being distances, one has that 
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3 wiisi3w2is2 ■ ■ ■ such that d{wi,Wi+\) < Si for all i > 1, Wi € L for all odd i, 
and Wi ^ L for all even i. 

The fact that the language accepted by an automaton that is not inherently 
weak has the dense oscillating sequence property is an immediate consequence of 
the fact that such an automaton has a reachable strongly connected component 
containing both accepting and non accepting cycles. Given this, it is sufficient 
to prove the following lemma in order to establish Theorem 2. 

Lemma 1. An to-regular language that has the dense oscillating sequence prop- 
erty cannot he accepted by a weak deterministic automaton and hence is not in 
Fa AGs. 

Proof. We proceed by contradiction. Assume that a language L having the dense 
oscillating sequence property is accepted by a weak deterministic automaton A. 
Consider the first word w\ in a dense oscillating sequence for L. This word 
eventually reaches an accepting component Qi^ of the partition of the state set 
of A and will stay within this component. Since Si can be chosen freely, it can 
be taken small enough for the run of A on W 2 to also reach the component 
before it starts to differ from w\. Since W 2 is not in L, the run of A on W 2 
has to eventually leave the component Qi, and will eventually reach and stay 
within a non accepting component Qi^ < Qi^. Repeating a similar argument, 
one can conclude that the run of A on W 3 eventually reaches and stays within 
an accepting component Qi^ < Qi^ . Carrying on with this line of reasoning, one 
concludes that the state set of A must contain an infinite decreasing sequence 
of distinct components, which is impossible given that it is finite. 



6 Deciding Linear Arithmetic with Real and Integer 
Variables 

We first show that the result of Section 3 also applies to the sets of words 
encoding sets defined in (K, Z, +, <). In order to do so, we need to establish that 
the topological class fl Gs defined over sets of reals is mapped to its w-word 
counterpart by the encoding relation described in Section 4. 

Theorem 3. Let n > 0 and r > 1 be integers, and let L{S) C ({0, . . . , r — 1}" U 
{*})“ be the set of all the encodings in base r of the vectors belonging to the set 
S C R”. If the set S belongs to F„ H Gs (with respect to Euclidean distance), 
then the language L{S) belongs to F^ H Gs (with respect to co-word distance). 

Proof Not all words over the alphabet {0, . . . , r— 1}”U{*} encode a real vector. 
Let V be the set of all the valid encodings of vectors in base r. Its complement 
V can be partitioned into a set V 0 containing only words in which the separator 
does not appear, and a set V + containing words in which occurs at least 
once. 

The set V^AV is closed. Indeed, each element of its complement is a word 
that does not encode validly a vector and that contains at least one separator. 
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Such a word admits a neighborhood entirely composed of words satisfying the 
same property, which entails that the complement oiVoUV is open. In the same 
way, one obtains that the set U F is open. 

Let now consider an open set S' C M". The language L' = L{S) U F+ is open. 
Indeed, each word w € L(S) has a neighborhood entirely composed of words in 
L{S) (formed by the encodings of vectors that belong to a neighborhood of the 
vector encoded by w), and of words that do not encode vectors but contain at 
least one separator. Moreover, each word w G V admits a neighborhood fully 
composed of words in Since L{S) = L' fl (Mq U M), we have that L{S) is the 
intersection of an open and of a closed set. 

The same line of reasoning can be followed with a closed set S C R”. The 
language L" = L{S) U Mq is easily shown to be closed, which, since L{S) = 
L" n (M+ U y), implies that L{S) is the intersection of a closed and of an open 
set. 

We are now ready to address the case of a set S' C R” that belongs to 

C\Gs- Since S is in F„, it can be expressed as a countable union of closed 
sets Si, S2, .... The languages L(Si), L{S2), ■ ■ ■ are Boolean combinations of 
open and of closed sets, and thus belong to the topological class F„. Therefore, 
L{S) = L{S\) U L{S2) U • • • is a countable union of sets in F„, and thus belongs 
itself to F^. Now, since S is in Gs, it can also be expressed as a countable 

intersection of open sets S[, S'2, The languages L(S(), ^(S^), ...belong 

to the topological class Gs- Hence, L{S) = L{S[) fl L{S'2) fl • • • is a countable 
intersection of sets in Gs, and thus belongs itself to Gs- This concludes our proof 
of the theorem. 

Knowing that the encodings of sets definable in the theory (R, Z,+, <) are 
in F„C\Gs, we use the results of Section 5 to conclude the following. 

Theorem 4. Every deterministic RVA representing a set definable in (R, Z, +, 
<) is inherently weak- 

This property has the important consequence that the construction and the 
manipulation of RVA obtained from arithmetic formulas can be performed effec- 
tively by algorithms operating on weak automata. Precisely, to obtain an RVA 
for an arithmetic formula one can proceed as follows. 

For equations and inequations, one uses the constructions given in [BRW98] 
to build weak RVA. Computing the intersection, union, and Cartesian prod- 
uct of sets represented by RVA simply reduces to performing similar operations 
with the languages accepted by the underlying automata, which can be done by 
simple product constructions. These operations preserve the weak nature of the 
automata. To complement a weak RVA, one determinizes it using the breakpoint 
construction, which is guaranteed to yield an inherently weak automaton (The- 
orem 4) that is easily converted to a weak one. This deterministic weak RVA is 
then complemented by inverting the accepting or non-accepting status of each 
of its components, and then removing from its accepted language the words 
that do not encode validly a vector (which is done by means of an intersection 
operation) . 
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Applying an existential quantifier to a weak RVA is first done by removing 
from each transition label the symbol corresponding to the vector component 
that is projected out. This produces a non-deterministic weak automaton that 
may only accept some encodings of each vector in the quantified set, but generally 
not all of them. The second step thus consists of modifying the automaton so as 
to make it accept every encoding of each vector that it recognizes. Since different 
encodings of a same vector differ only in the number of times that their leading 
symbol is repeated, this operation can be carried out by the same procedure as 
the one used with finite- word number automata [Boi98]. This operation does not 
affect the weak nature of the automaton, which can then be determinized by the 
breakpoint construction, which has to produce an inherently weak RVA easily 
converted to a weak automaton. 

Thus, in order to decide whether a formula of (M, Z, -k, <) is satisfiable, one 
simply builds an RVA representing its set of solutions, and then check whether 
this automaton accepts a nonempty language. This also makes it possible to 
check the inclusion or the equivalence of sets represented by RVA. The main 
result of this paper is that, at every point, the constructed automaton remains 
weak and thus only the simple breakpoint construction is needed as a deter- 
minization procedure. 



7 Conclusions 

A probably unusual aspect of this paper is that it does not introduce new al- 
gorithms, but rather shows that existing algorithms can be used in a situation 
where a priori they could not be expected to operate correctly. To put it in other 
words, the contribution is not the algorithm but the proof of its correctness. 

The critical reader might be wondering if all this is really necessary. After all, 
algorithms for complementing Biichi automata exist, either through determiniza- 
tion [Saf88] or directly [Biic62,SVW87,Kla91,KV97] and the more recent of these 
are even fairly simple and potentially implementable. There are no perfectly ob- 
jective grounds on which to evaluate “simplicity” and “ease of implementation”, 
but it is not difficult to convince oneself that the breakpoint construction for de- 
terminizing weak automata is simpler than anything proposed for determinizing 
or complementing Biichi automata. Indeed, it is but one step of the probably sim- 
plest complementation procedure proposed so far, that of [KV97]. Furthermore, 
there is a complexity improvement from to experience with the 

subset construction as used for instance in the LASH tool [LASH] indicates that 
the breakpoint construction is likely to operate very well in practice; and being 
able to work with deterministic automata allows minimization [LodOl], which 
leads to a normal form. 

An implementation and some experiments would of course substantiate the 
claims to simplicity and ease of implementation. It is planned in the context of 
the LASH tool and will be made available [LASH]. However, this paper is not 
about an implementation, but about the fact that, with the help of what might 




624 B. Boigelot, S. Jodogne, and P. Wolper 



appear to be pure theory, one can obtain very interesting conclusions about 
algorithms for handling the theory (K, Z,+,<). 
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Abstract. The modalities of Dynamic Logic refer to the final state of 
a program execution and allow to specify programs with pre- and post- 
conditions. In this paper, we extend Dynamic Logic with additional trace 
modalities “throughout” and “at least once” , which refer to all the states 
a program reaches. They allow one to specify and verify invariants and 
safety constraints that have to be valid throughout the execution of a 
program. We give a sound and (relatively) complete sequent calculus for 
this extended Dynamic Logic. 



1 Introduction 

We present a sequent calculus for an extended version of Dynamic Logic (DL) 
that has additional modalities “throughout” and “at least once” referring to the 
intermediate states of program execution. 

Dynamic Logic [10,5,9,6] can be seen as an extension of Hoare logic [2]. It 
is a first-order modal logic with modalities [a] and (a) for every program a. 
These modalities refer to the worlds (called states in the DL framework) in 
which the program a terminates when started in the current world. The for- 
mula [a](j) expresses that (j) holds in all final states of a, and {a)<j) expresses 
that (j) holds in some final state of a. In versions of DL with a non-deterministic 
programming language there can be several such final states (worlds). Here we 
consider a Deterministic Dynamic Logic (DDL) with a deterministic while pro- 
gramming language [4,7]. For deterministic programs there is exactly one final 
world (if a terminates) or there is no final world (if a does not terminate). The 
formula (j) — >■ (q;)^’ is valid if, for every state s satisfying pre-condition (j), a run 
of the program a starting in s terminates, and in the terminating state the 
post-condition ip holds. The formula (/) ^ [ct]ip expresses the same, except that 
termination of a is not required, i.e., ip only has to hold if a terminates. 

Thus, 4> — >■ [alpj is similar to the Hoare triple {(p}a{ip}. But in contrast to 
Hoare logic, the set of formulas of DL is closed under the usual logical operators. 
In Hoare logic, the formulas (p and ip are pure first-order formulas, whereas in 
DL they can contain programs. That is, DL allows one to involve programs in 
the formalisation of pre- and post-conditions. The advantage of using programs 
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is that one can easily specify, for example, that some data structure is not cyclic, 
which is impossible in pure first-order logic. 

In some regard, however, standard DL (and DDL) still lacks expressivity: 
The semantics of a program is a relation between states; formulas can only 
describe the input/output behaviour of programs. Standard DL cannot be used 
to reason about program behaviour not manifested in the input /output relation. 
It is inadequate for reasoning about non-terminating programs and for verifying 
invariants or constraints that must be valid throughout program execution. 

We overcome this deficiency and increase the expressivity of DDL by adding 
two new modalities |a] (“throughout”) and ((a)) (“at least once”). In the ex- 
tended logic, which we call (Deterministic) Dynamic Logic with Trace Modali- 
ties (DLT), the semantics of a program is the sequence of all states its execution 
passes through when started in the current state (its trace). It is possible in 
DLT to specify properties of the intermediate states of terminating and non- 
terminating programs. And such properties (typically safety constraints) can be 
verified using the calculus presented in Section 4. This is of great importance as 
safety constraints occur in many application domains of program verification. 

Previous work in this area includes Pratt’s Process Logic [10,11], which is 
an extension of propositional DL with trace modalities (DLT can be seen as 
a first-order Process Logic). Also, Temporal Logics have modalities that allow 
one to talk about intermediate states. There, however, the program is fixed and 
considered to be part of the structure over which the formulas are interpreted. 
Temporal Logics, thus, do not have the compositionality of Dynamic Logics. 

The calculus for DDL described in [7] (which is based on the one given in [4]) 
has been implemented in the software verification systems KIV [12] and VSE [8]. 
It has successfully been used to verify software systems of considerable size. 

The work reported here has been carried out as part of the KeY project [1].^ 
The goal of KeY is to enhance a commercial CASE tool with functionality for 
formal specification and deductive verification and, thus, to integrate formal 
methods into real-world software development processes. In the KeY project, a 
version of DL for the Java Card programming language [3] is used for verifica- 
tion. Deduction in DL (and DLT) is based on symbolic program execution and 
simple program transformations and is, thus, close to a programmer’s under- 
standing of a program’s semantics. Our motivation for considering trace modal- 
ities was that in typical real-world specifications as they are done with the help 
of CASE tools, there are often program parts for which invariants and safety 
constraints are given, but for which the user did not bother to give a full speci- 
fication with pre- and post-conditions. 

We define the syntax of DLT in Section 2 and its semantics in Section 3. In 
Section 4, we describe our sequent calculus for DLT. Theorems stating soundness 
and (relative) completeness are presented in Section 5 (due to space restrictions, 
the proofs are only sketched, they can be found in [13]). In Section 6, we give 
an example for verifying that a non-terminating program preserves a certain 
invariant. Finally, in Section 7, we discuss future work. 

^ More information on KeY can be found at il2www.ira.uka.de/~key. 
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2 Syntax of DL with Trace Modalities 

In first-order DL, states are not abstract points (as in propositional DL) but 
valuations of variables. Atomic programs are assignments of the form x := t. 
Executing x : = t changes the program state by assigning the value of the term t 
to the variable x. The value of a term t depends on the current state s (namely 
the value that s gives to the variables occurring in t) . The function symbols are 
interpreted using a fixed first-order structure. This domain of computation, over 
which quantification is allowed, can be considered to define the data structures 
used in the programs. The logic DLT as well as the calculus presented in Section 4 
are basically independent of the domain actually used. The only restriction is 
that the domain must be sufficiently expressive. In the following, for the sake 
of simplicity, we use arithmetic as the single domain. In practice, there will 
be additional function and predicate symbols and different types of variables 
ranging over different sorts of a many-sorted domain (different data structures) . 

The arithmetic signature An contains (a) the constant 0 (zero) and the unary 
function symbol s (successor) as constructors (in the following we abbreviate 
terms of the form s(- • • s(0) • • • ) with their decimal representation, e.g. “2” ab- 
breviates “s(s(0))”), (b) the binary function symbols -I- (addition) and * (mul- 
tiplication), and (c) the binary predicate symbols < (less or equal than) and 
= (equality). In addition, there is an infinite set Var of object variables, which 
are also used as program variables. The set Term^ of terms over An is built as 
usual in first-order predicate logic from the variables in Var and the function 
symbols in An. The formulas of first-order predicate logic without modal opera- 
tors (FOL-formulas) over An are constructed as usual from the terms in TermN 
and the predicate symbols in An, using the classical connectives A (conjunction), 
V (disjunction), — >■ (implication), and -■ (negation), and the quantifiers V and 3. 

We proceed to define what the programs of the deterministic programming 
language of DDL and DLT are. The programming constructs for forming complex 
programs from the atomic assignments are the concatenation of programs, if- 
then-else conditionals, and while loops (the two latter program constructs use 
quantifier-free FOL-formulas as conditions). 

Definition 1. The programs of DLT are recursively defined by: (i) If x € Var 
and t € Term^, then x := t is a program (assignment) . (ii) If a and [3 are 
programs, then a; (3 is a program (concatenation) . (Hi) If a and (3 are programs 
and e is a quantifier-free FOL-formula, then if e then a else (3 is a program 
(conditional) . (iv) If a is a program and e is a quantifier-free FOL-formula, then 
while e do a is a program (loop). 

The programs of DLT form a computationally complete programming lan- 
guage. For every partial recursive function / : N — >■ N there is a program af{x) 
that computes /, i.e., if af{x) is started in an arbitrary state in which the value 
of X is some n G N, then it terminates in a state in which the value of x is f{n). 

Now, we define the formulas of DLT. Note, that the first four conditions in 
Definition 2 are the same as in the definition of FOL-formulas. Only the last 
condition is new, which adds the modalities (and programs) to the formulas. 
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Definition 2. The set of DLT- formulas is recursively defined by: (i) true and 
false are DLT-formulas. (ii) //ti, t 2 G Termn, then t\ < t 2 and t\ = t 2 are DLT- 
formulas. (Hi) If (|),^f are DLT-formulas, then so are and 

(j)^ip. (iv) If 4> is a DLT-formula and x G Var, then 3x(f>, 'ix(f> are DLT- 
formulas. (v) If 4> is a DLT-formula and a is a program (Def. 1), then [a](f>, 
{a)4>, \a\4>, and {{a))(j) are DLT-formulas. 



Definition 3. A sequent is of the form 4>i, . . . , (fm 1“ 'tpi, ■ ■ ■ ,'f’n (’m,n>Q), 
where the 4>i and tpj are DLT-formulas. The order of the 4>i resp. the ipj is 
irrelevant, i.e., 4>i, . . . , (fm and ifi, . . . ,ifn are treated as multi-sets. 



Definition 4. A variable x G Var is bound in a DLT-formula </> if it occurs 
inside the scope of (i) a quantification Vx resp. 3a;, or (ii) a modality [a], (a), 
|a], or {{a)) containing an assignment x :=t. The variable x is free in (j) if there 
is an occurrence of x in 4> that is neither bound by a quantifier nor a modality. 



Definition 5. A substitution assigns to each object variable in Var a term in 
Termp}. A substitution a is applied to a DLT-formula <j) by replacing all free 
occurrences of variables x in (j by (i{x). 

If a substitution {x/t} instantiates only a single variable x, its application to 
a formula (j) or a formula multi-set T is denoted by (f>\, resp. T).. 

A substitution a is admissible w.r.t. a DLT-formula <f> if there are no variables 
X and y such that x is free in 4>, y occurs in a{x), and, after replacing a(x) for 
some free occurrence of x in (j, the occurrence of y in a(x) is bound in a{4>). 

3 Semantics of DL with Trace Modalities 

Since we use arithmetic as the only domain of computation, the semantics of DLT 
is defined using a single fixed model, namely (N, In) • It consists of the universe N 
of natural numbers and the canonical interpretation function I^ assigning the 
function and predicate symbols of Aii!} their natural meaning in arithmetic. 

The states (worlds) of the model (only) differ in the value assigned to the 
object variables. Therefore, the states can be defined to be variable assignments. 

Definition 6. A state s assigns to each variable x G Var a number s(x) G N. 

Let X G Var and n G N; then s' = s{a; ^ n} is the state that is identical to s 
except that x is assigned n, i.e., s'(x) = n and s'(y) = s{y) for all x ^ y. 

The truth value of DLT-formulas in a state s is given by a valuation function 
vals that assigns to each term t G Termjq a natural number valsft) € N and to 
each formula one of the truth values t and f. This function is defined step by step. 
For variables x G Var, it is defined by vals{x) = s(x). It is extended to terms and 
FOL-formulas as usual in first-order predicate logic (note, that the way in which 
function symbols are interpreted depends on the interpretation function of the 
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domain of computation, which in our case is I^). Below, we describe how vals is 
defined for programs (Def. 7) and, finally, is extended to DLT-formulas (Def. 8). 

In DDL, where the modalities only refer to the final state of a program 
execution, the semantics of a program a is a reachability relation on states: A 
state s' is a-reachable from s if a terminates in s' when started in s. In DLT the 
situation is different. The additional modalities refer to the intermediate states 
as well. Since the programs are deterministic, their intermediate states form a 
sequence. Thus, the semantics of a program a w.r.t. a state s is the — finite or 
infinite — sequence of all states that a reaches when started in s, called the tract 
of a. It includes the initial state s (and the final state in case a terminates). 

Definition 7 . A trace is a non-empty, finite or infinite sequence of states. 

The last element of a finite trace T is denoted with last(T). 

The concatenation of traces T\ and T2 is defined by: Ti 0T2 = T\ if T\ is 
infinite, and Ti o T2 = {s\, . . . , sj., si) ■ • ■ ) if Ti = {s\, . . . , si) is finite and 
T2 = (s?) si, s§, . . . ) (the first state ofT2 is omitted in the concatenation) . 

Given a state s, the valuation function vals assigns a trace to each program 
as follows: 

- vals{x :=t) = (s, s{a; ^ vals{t)}). 

- vals(a;P) = vals(a) o valiast{vah{a)){fi)- 

- vals{ii e then a else fi) is defined to he equal to vals{a) if vals (e) = t and 
to be equal to vals{fi) if vals (e) = f- 

- vals(vh±le e do a) is defined as follows (there are three cases). Let Sn be the 
initial state of the n-th iteration of the loop body a, i.e., si = s and, for n > 1 , 
s„+i = last{vals^{a)) if Sn is defined and vals„{a) is finite (otherwise s„+i 
remains undefined). 

Case 1 (the loop terminates): If for some n G N, (i) valsfia) is finite for all 
i <n, (ii) vals fie) = t for all i < n, and (Hi) vals„^fie) = f, then we define 
ua/s (while e do a) to be the finite sequence valsficx) o • • • o vals„{oi). 

Case 2 (each iteration terminates but the condition e remains true such that 
the loop does not terminate): If for all n > 1, (i) vals„^{a) is finite and 
(ii) vals„{e) = t, then we define valsfiihlle e do a) to be the infinite sequence 
valsi (a) o vals2 (a) o • • • . 

Case 3 (some iteration does not terminate): If for some n G N, (i) valsfia) 
is finite for i < n, (ii) vals„{a) is infinite, and (Hi) vals fie) = t for all i < n, 
then ua^s(while e do a) is the infinite sequence valsfia) o • • • o vals„{a). 

Definition 8 . Given a state s, the valuation function vals assigns to a DLT- 
formula (j) one of the truth values t and f as follows: (i) If <j) is true, false, or an 
atomic formula, or its principal logical operator is one of the classical operators 
A, V, — >■, -1, or one of the quantifiers V, 3 , then vals{ 4 >) is recursively defined 
as usual in first-order predicate logic, (ii) vals{[a](l>) = t iff vals{a) is infinite 
or valsfifi) = t where s' = last{vals{a)) . (Hi) vals{{a) 4 >) = t iff vals (a) is finite 
and valsfifi) = t where s' = last{vals{a)) . (iv) valsdalfi) = t iff valsfifi) = tfor 
all s' G vals{a). (v) vals{{{a))(l)) = t iff vals fi(j>) = tfor at least one s' G vals{a). 
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Table 1. The elementary rules of the calculus. 



Axioms 



r, <j> cj>, A 



(Rl) 



r h true, A r, false h T 



Rules for classical logical operators and quantifiers 



r, 4> A 
r A 



(R4) 



r 'r <t>, A 
r, ^ A 



(R5) 



r 'r (j>, A r 'r tp, A 
r" h 0 A V’ ) ^ 



(R6) 



r, p, tp A 
r, 4> A 'tp \- A 



(R7) 



r <p, tp, A 
r \- (p \/ Ip, A 



(R8) 



r, <p A r, tp A 
r, pv tp h A 



(R9) 



r, p 'r tp, A 
r h p^p, A 



(RIO) 



r, Wxp, Pi A 
r,Vxp h zi 



(R13) 



where {x/t} is 
admissible w.r.t. <P 



Weakening and Cut 
r A 



r h p, A 



(R16) 



r 'r p, A r, p 'r A 
r, p^p h 



(Rll) 



r h Pi', A 
r h \/xp, A 



(R12) 



x' is new w.r.t. (p, F, A 



r. Pi' H 
r, 3xp h A 



(R14) 



x' is new 
w.r.t. <p, r, A 



r h pl, 3xp, A 
r h 3xp, A 



(R15) 



where {a;/t} is 
admissible w.r.t. p 



r A 
r, p h A 



(R17) 



r, p A r p, A 
r A 



(R18) 



Definition 9. If vals{4>) = t, then (j) is said to he true in the state s; otherwise 
it is false in s. A formula is valid if it is true in all states. 

A sequent F A is valid iff the DLT-formula /\F ^ \f A is valid. 



4 A Sequent Calculus for DL with Trace Modalities 

In this section, we present a sequent calculus for DLT, which we call Cdlt- It 
is sound and relatively complete, i.e., complete up to the handling of arithmetic 
(see Section 5). The set of those CoLT-rules in which the additional modalities 
|-] and ((•)) do not occur forms a sound and (relatively) complete calculus for 
DDL. This restriction of Cdlt is similar to the DDL-calculus described in [7]. 

Most rules of the calculus are analytic and therefore could be applied auto- 
matically. The rules that require user interaction are: (a) the rules for handling 
while loops (where a loop invariant has to be provided), (b) the induction rule 
(where a useful induction hypothesis has to be found), (c) the cut rule (where 
the right case distinction has to be used), and (d) the quantifier rules (where the 
right instantiation has to be found). 

In the rule schemata, F, A denote arbitrary, possibly empty multi-sets of 
formulas, and </>, tp denote arbitrary formulas. As usual, the sequents above the 
horizontal line in a schema are its premisses and the single sequent below the 
horizontal line is its conclusion. Note, however, that in practice the rules are 
applied from bottom to top. Proof construction starts with the original proof 
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Table 2. The rules for handling arithmetic. 



Oracle rules 



Induction 



r h A 



(R19) 



where f\F \J A is a, 
valid arithmetical FOL-formula 



r[, T 2 h 4 
A. F2 'r A 



(R20) 



where f\ F\ ^ /\ is a 
valid arithmetical FOL-formula 



F h 0(0), zl A 4>(n) H 4>(s(n)), A 
F h Vn0(n), Z1 



(R21) 



where n does not occur in A 



obligation at the bottom. Therefore, if a constraint is attached to a rule that 
requires a variable to be “new”, it has to be new w.r.t. the conclusion. 

Definition 10. The calculus Cdlt consists of the rules (Rl) to (R51) shown in 
Tables 1-4- A sequent is derivable (with Colt^ if it is an instance of the conclu- 
sion of a rule schema and all corresponding instances of the premisses of that rule 
schema are derivable sequents. In particular, all sequents are derivable that are 
instances of the conclusion of a rule that has no premisses (Rl, R2, R3, R19). 



The Elementary Rules. The elementary rules of Cdlt are shown in Table 1. 
The table contains rules for axioms (which have no premisses and make it pos- 
sible to close a branch in the proof tree), rules for the propositional operators 
and the quantifiers, weakening rules, and the cut rule. These rules form a sound 
and complete calculus for first-order predicate logic. 



Rules for Handling Arithmetic. Our calculus is basically independent of 
the domain of computation resp. data structures that are used. We therefore 
abstract from the problem of handling the data structure(s) and just assume 
that an oracle is available that can decide the validity of FOL-formulas in the 
domain of computation (note that the oracle only decides pure FOL-formulas). 
In the case of arithmetic, the oracle is represented by rule (R19) in Table 2. 
Rule (R20) is an alternative formalisation of the oracle that is often more useful. 

Of course, the FOL-formulas that are valid in arithmetic are not even enu- 
merable. Therefore, in practice, the oracle can only be approximated, and rules 
(R19) and (R20) must be replaced by a rule (or set of rules) for computing resp. 
enumerating a subset of all valid FOL-formulas (in particular, these rules must 
include equality handling) . This is not harmful to “practical completeness” . Rule 
sets for arithmetic are available, which — as experience shows — allow to derive 
all valid FOL-formulas that occur during the verification of actual programs. 

Typically, an approximation of the computation domain oracle contains a 
rule for structural induction. In the case of arithmetic, that is rule (R21). This 
rule, however, is not only used to approximate the arithmetic oracle but is in- 
dispensable for completeness. It not only applies to FOL-formulas but also to 
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Table 3. Rules for the modal operators. 



Assignment 



r \- [x := t]<p^ A 
where x' is new w.r.t. <p, P, A 



(R22) 



P h 



A r® 



X A h (p, A^ 



P h |t := t}(p, A 
where x' is new w.r.t. t, <p, P, A 



(R24) 



P^ , X A 4>, A] 

P h {x : = t)<p, A 
where x' is new w.r.t. t, 0, P, A 

rp x = t:' h 0, A-J 

r h {{x : = t))4>^ A 
where x' is new w.r.t. t, 4>, F, A 



(R23) 



(R25) 



Concatenation 

r h [aH/310, zl 



(R26) 



r h [a;/3]0, Zi 

r h H0, A r V- M[/3]0, 
r h Ia;/3]0, zl 

Zi r, -.e h [/3]0. zi 



(R28) 



r h {a){0)<l>, A 
r h (a;/3)0, zi 

r h «a))0, (a)m)4:, A 



(R27) 

(R29) 



If-then-else 
r, £ h [q 



F [if e then a else j3]fp, A 
F, £ h [al0, Zi r, ^£ h 1/310, Zi 



(R30) 



R I- |if £ then ct else /3]0, zi 

While 



(R32) 



r h «a;/3))<^, zi 

r, £ h {a)(j>, A r, -.£ h </3)0, zi 
R I- (if £ then a else Zi 

F, £ h ((a))0. zi r, ^£ h ((/3))0, zi 



R I- ((if £ then a else /3))0. zi 
R h Inv , A Inv , e h [a]Inv Inv ^ -i£ 



(R31) 

(R33) 



F [while £ do A 
where Inv is an arbitrary DLT-formula 



R h £, zi F \- (a) (while £ do zi 



(R35) 



F h 



(R34) 



zi r h 



F (while £ do a)((), zi 

F h Inv, A Inv, e h [a] 7m; Inv, £ h |a 
F h [while £ do a]0, Zi 
where Inv is an arbitrary DLT-formula 



F h (while £ do a)4>, A 
^ ^ ^ (R37) 



(R36) 



h e, P \- (q:) ((while e do a))0, Z\ 
F h ((while e do oc))(f>, A 



(R38) 



e h ((a))0, Z\ 



F h ((while € do 0 ())(f), A 



(R39) 



DLT-formulas containing programs; and it is needed for handling the modalities 
(•) and ((•)) when they contain while loops (see Section 4). 



Rules for Modalities and Programs. The rules for the modal operators and 
the programs they contain are shown in Table 3. As is easy to see, they basically 
perform a symbolic program execution. 

There is a rule for each combination of program construct (assignment, con- 
catenation, if-then-else, while loop) and modality ([•], (•), |-], ((•))). To keep the 
description of our calculus compact we only give rules for the case where the 
modal formula is on the right side of a sequent. That is sufficient for complete- 
ness because using the cut rule (R18) and the rules for negated modalities (R48) 
to (R51) (see Table 4), every modal formula on the left side of a sequent can be 
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turned into an equivalent formula on the right side of the sequent. For example, 
from the proof obligation |a]</> F we get the proof obligation h ~'|a](/) with 
the cut rule, which then can be turned into h {{a))-<4> applying rule (R50). 

Rules for Assignments. The rules for the modalities [•] (R22) and (•) (R23) are 
the traditional assignment rules of calculi for first-order DL. They introduce a 
new variable x' representing the old value of x before the assignment x : = t is 
executed. In the premisses of the assignment rules, both x and x' occur because 
the premisses express the relation between the old and the new value of x without 
using an explicit assignment. Since assignments always terminate, there is no 
difference between the two rules. Note, that the premiss and the conclusion of 
these rules are not necessarily equivalent (as a new symbol is introduced). But 
if one is valid then the other is valid as well. 

Example 1. Consider the valid sequent x = 5 h {x :=x + l)x = (S. Applying 
rule (R23) yields the new sequent x' = x = x' + 1 h a; = 6. It can be read 
as: “If the old value of a; is 5 and its new value is its old value plus 1, then the 
new value of x is 6.” This exactly captures the meaning of the original sequent. 

Assignments x :=t are atomic programs. By definition, their semantics is 
a trace consisting of the initial state s and the final state s' = s{a; vals{f}}. 
Therefore, the meaning of |a: '-=114) is that 4 is true in both s and s', which is 
what the two premisses of rule (R24) express. The formula {{x := t))4, on the 
other hand, is true (in s) if 4 is true in at least one of the two states. Note, that 
the two formulas 4% and 4 in the premiss of rule (R25), which express that 4 is 
true in s resp. s' , are implicitly disjunctively connected. 

Example 2. We use rule (R24) to show that x = b h \x : = x + Ijcc < 6 is a 
valid sequent. This results in the two new proof obligations x = 5 h a: < 6 and 
x' < 5, X = x' + 1 h X < 6. They state that x < 6 is true in both the initial and 
the final state of the assignment. 

Let ewen(x) be an abbreviation for the FOL-formula 3y (x = 2 * y). To prove 
the validity of F ((x : = x -I- l))exen(x), we apply rule (R25) and get the new 
proof obligation x = x' -I- 1 F even(x), evenfx'), which is obviously valid. 

Rules for Concatenation. Again, the rules for the modalities [•] (R26) and 
(•) (R27) are the traditional rules for first-order DL. They are based on the 
equivalences [a-,(i]4 ^ H[/^]0 resp. {a;f3)4 ^ {a){(})4- 

In the case of the |-] modality, the concatenation rule (R28) branches. To 
show that a formula 4 is true throughout the execution of a ; /?, one has to prove 
(a) that 4 is true throughout the execution of a, i.e. \cx\4i and (b) provided a 
terminates, that 4 is true throughout the execution of (3 that is started in the 
final state of a, i.e. [a]|/3]</>. 

The concatenation rule for ((•)) (R29) does not branch. A formula 4 is true 
at least once during the execution of a;l3 if (a) it is true at least once during 
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the execution of a, or (b) a terminates and </> is true at least once during the 
execution of (3 that is started in the final state of 

Rules for If-then-else. The rules for if-then-else conditionals have the same form 
for all four modalities, and for the modalities [•] and (•) they are the same as in 
calculi for standard DDL. 

Rules for While Loops. The rules for while loops in the modalities [•] and |-], 
(R34) resp. (R37), use a loop invariant, i.e., a DLT-formula that must be true 
before and after each execution of the loop body. Three premisses of (R37) are 
the same as the premisses of (R34). The first one expresses that the invariant Inv 
holds in the current state, i.e., before the loop is started. The second premiss 
expresses that Inv is indeed an invariant, i.e., if it holds before executing the 
loop body a, then it holds again if and when a terminates. And the third pre- 
miss expresses that (j ) — the formula that supposedly holds after resp. throughout 
executing the loop — is a logical consequence of the invariant and the negation of 
the loop condition e, i.e., is true when the loop terminates. For the |-] modality, 
this last premiss is only needed for the case that e is false from the beginning 
and the loop body a is never executed. The rule for |-] (R37) has an additional 
premiss, which requires to show that (j) remains true throughout the execution 
of a if the invariant is true at the beginning (this latter condition follows from 
the other premisses). 

Example 3. Let a be the loop while true do cc := 0. Then, because a does not 
terminate, the sequent a: = 0 h \a-,x : = l]x = 0 is valid. To prove that, we ap- 
ply rule (R28), which results in the two new proof obligations a; = 0 h |a]x = 0 
and a; = 0 h [a]|a; := l]x = 0. Both are easy to derive with the rules for while 
loops, namely the former one with rule (R37) and the invariant a; = 0 and the 
latter one with rule (R34) and the invariant true. 

The modalities (•) and ((•)) are handled in a different way. Two rules are 
provided for each of them. One rule, (R35) resp. (R38), allows us to “unwind” 
the loop, i.e., to symbolically execute it once, provided that the loop condition e is 
true in the current state. The other rule, (R36) resp. (R39), is used if “unwinding” 
the loop is not useful. For the (•) modality that is the case if e is false and the 
loop terminates immediately. Rule (R39) for the ((•)) modality applies in case 
the formula (j ) — which supposedly is true at least once during the execution of 
the loop — becomes true before or during the first execution of the loop body. 
The rules for (•) and ((•)) only work in combination with the induction rule, as 
the following example demonstrates. 

^ For non-deterministic versions of DL, rule (R29) is only sound provided that the 
following semantics is chosen for the ((•)) modality: {{a))<j> is true iff <j) is true at least 
once in some of the (several) traces of a. If, however, a non-deterministic semantics 
is chosen where <j) must be true at least once in every trace of a (as Pratt did for 
the propositional case [11]), then rule (R29) is not correct, and indeed we failed to 
find a sound rule for that kind of semantics. 
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Table 4. Miscellaneous rules. 



Generalisation 



<P \- 'll) 
|a]0 h |a]j/> 



(R42) 



0 h 

{{a))4> h {{a))ili 



(R43) 



Quantifier/modality rules 

r, Vxi . . .Vxfc 0, [a]4> h Zi 
r, Vxi . . .Vxfc 0 h 4 



(R44) 



where Var(a) C {xi , . . . , } 



h {oc)4>, 3xi . . . 3xk 0, 4 :^ 

i"" h . . . 3xk 0, 



(R45) 



where Var{oc) C {xi , . . . , } 



r, Vxi ...Vxk(p, H0 h zi 

r, Vti . . .vxfc 0 h z\ 



(R46) 



r h {{a)) 0 , 3 a:i . . . 0 , zi 

h . . . 3xk 0, 



(R47) 



where Var(a) C {rci , . . . , Tfc } 



where Var(a) C {xi , . . . , Tfe } 



Rules for negated modalities 
r H r h 4 

r h ^[a]0, 4 ^ r h ^<a)0, 4 



(R49) 



r h {{a))^<p, A r h [al^ 0 , 4 

r h ^[a| 0 , 4 r h 4 



(R51) 



Example 4- Consider the sequent a; = 0 h ((while true do a; \ = x + l))a; = k. It 
states that, if the value of x is 0 initially, then during the execution of the 
non-terminating loop, x will at least once have the value k. To show that this 
sequent is valid, we first use the induction rule to prove that h \/n4>{n) is 
valid, where 4>{n) = (x<kAn + x = k)^ ((while true do x : = x -I- l))x = k, 
from which then the original proof obligation can be derived instantiating n 
with k. The first premiss of the induction rule, h <('(0), can easily be derived 
with rule (R39) as x = fc is immediately true in case n = 0. The second premiss, 
4i{n) b 4>{n+ 1), can be derived by first applying the cut rule to distinguish 
the cases x < k and x = fc. In the first case, the unwind rule (R38) can be used 
successfully; and the second case is again easily covered with rule (R39). 



Miscellaneous Other Rules. There are three types of miscellaneous other 
rules (see Table 4). (a) The generalisation rules (R40) to (R43) permit to de- 
rive Op (j) h Op ■0 from (j) \- Ip where Op is any of the four modal operators, 
(b) Rules (R44) to (R47) allow to replace (universal) quantifications by modal- 
ities. They are similar to the quantifier instantiation rules (R13) and (R15) and 
are based on the fact that, for example, |o;(x)]0 is true in a state s if Vx0 is 
true in s and x is the only variable in a(x). (c) Rules (R48) to (R51) implement 
the equivalences -•[a](f) O {a)^(j) and -'|a ]0 O ((a))-i 0 . 

5 Soundness and Relative Completeness 

Soundness of the calculus Cdlt (Corollary 1) is based on the following theorem, 
which states that all rules preserve validity of the derived sequents. 
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Theorem 1. For all rule schemata of the calculus Cult, (R1) to (R51), the 
following holds: If all premisses of a rule schema instance are valid sequents, 
then its conclusion is a valid sequent. 

Corollary 1. If a sequent F C A is derivable with the calculus Cult, then it 
is valid, i.e., AF^yA is a valid formula. 

Proving Theorem 1 is not difficult. The proof is, however, quite large as soundness 
has to be shown separately for each rule. For the assignment rules, the proof is 
based on a substitution lemma and is technically involved. 

The calculus Cdlt is relatively complete; that is, it is complete up to the 
handling of the domain of computation (the data structures). It is complete if 
an oracle rule for the domain is available — in our case one of the oracle rules for 
arithmetic, (R19) and (R20). If the domain is extended with other data types, 
Cdlt remains relatively complete; and it is still complete if rules for handling 
the extended domain of computation are added. 

Theorem 2. If a sequent is valid, then it is derivable with Cult- 

Corollary 2. If <j> is a valid DLT-formula, then the sequent \~ 4> is derivable. 

Due to space restrictions, the proof of Theorem 2, which is quite complex, 
cannot be given here (it can be found in [13]). The proof technique is the same 
as that used by Harel [4] to prove relative completeness of his sequent calculus 
for first-order DL. The following lemmata are central to the completeness proof. 

Lemma 1. For every DLT-formula 4>ult there is an (arithmetical) FOL-for- 
mula 4>fol that is equivalent to 4>ult, i-o-, vals{4>uLT) = valsiffFOL) for all 
states s. 

The above lemma states that DLT is not more expressive than first-order 
arithmetic. This holds as arithmetic — our domain of computation — is expressive 
enough to encode the behaviour of programs. In particular, using Gbdelisation, 
arithmetic allows one to encode program states (i.e., the values of all the variables 
occurring in a program) and finite traces into a single number. Note that the 
lemma states a property of the logic DLT that is independent of the calculus. 

Lemma 1 implies that a DLT-formula 4>ult could be decided by construct- 
ing an equivalent FOL-formula and then invoking the computation domain 
oracle — if such an oracle were actually available. But even with a good approx- 
imation of an arithmetic oracle, that is not practical (the formula ^fol would 
be too complex to prove automatically or interactively). And, indeed, the calcu- 
lus Cdlt does not work that way. 

It may be surprising that the (relative) completeness of Cdlt requires an 
expressive computation domain and is lost if a simpler domain and less expressive 
data structures are used. The reason is that in a simpler domain it may not be 
possible to express the required invariants resp. induction hypotheses to handle 
while loops. 
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Lemma 2. Let 4> and ip he FOL-formulas, let a he a program, and let Ma he 
any of the modalities [a] , (a) , |a] , ((a)) . 

If the sequent 4> h if is valid, then it is derivable with Cdlt- 

This lemma is at the core of the completeness of Cdlt- It is proven by induction 
on the complexity of the program a, and the proof would not go through if the 
calculus would lack important rules (not all rules are indispensable; some can be 
derived from other rules, they are included for convenience.). 

Besides Lemmata 1 and 2, the completeness proof makes use of the fact that 
the calculus has the necessary rules (a) for the operators of classical logic (in par- 
ticular all propositional tautologies can be derived), and (b) for generalisation, 
(R40) to (R43). 

6 Extended Example 

Consider the program “while true do if y = 1 then a else /3” where a abbre- 
viates the sub-program “x := a: -I- 1 ; if a; = 2 then y : = 0 else y := 1” and (3 
stands for “a; : = 0 ; y : = 1” . The program consists of a non-terminating while 
loop. The loop body changes the value of x between 0 and 2 and the value of y 
between 0 and 1. We want to prove that 0 < x < 2 is true in all states reached 
by this program, if it is started in a state where vals{x) = 0 and valgiy) = 1 
(we use 0 < X < 2 as an abbreviation for 0 < x A x < 2). The proof is shown in 
Figure 1. Its initial proof obligation is the sequent (1). First, the while loop is 
eliminated applying rule (R37) with the invariant 

Inv ;= 0<2/<l A (j/ = 0^x=lVx = 2) A (j/=l^-x = 0) . 

The formula 0 < x < 2, which is a logical consequence of Inv, does not describe 
the behaviour of the loop in sufficient detail and, therefore, is not a suitable in- 
variant itself. The result of applying rule (R37) to (1) are the four new proof obli- 
gations (2)-(5). Proof obligation (2) can immediately be derived with rule (R19). 
And, applying rule (R5) to (5) yields a sequent (5') with true on the right, which 
can be derived with rule (R2). 

In the sequel, we concentrate on the proof of (4). Proof obligation (3) can be 
derived in a similar way; its derivation is omitted due to lack of space. 

The next step is the application of rule (R32) to (4) to symbolically execute 
the if-then-else statement. The result are the two proof obligations (6) and (7). 
Eliminating the concatenations in (6) and (7) with applications of rule (R28) 
yields (8) and (9) resp. (10) and (11). Next, we simplify (and weaken) the left 
sides of (8)-(ll) with the arithmetic rule (R20) (this is not really necessary but 
the sequents get shorter and easier to understand). The result are the sequents 
(12)-(15), respectively. The derivations of proof obligations (12), (14), and (15) 
need no further explanation and are shown in Figure 1. To derive (13), we ap- 
ply (R22) and get (16). The if-then-else statement is symbolically executed with 
rule (R32), which results in (17) and (18). Proof obligation (17) is derived by 
applying rule (R24), which yields (19) and (20). It is easy to check that (19) 
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X = 0 h 0 < X <2 



(R19) — 



x' ^ 0, X = x' -\- 1 h 0<a:<2 



ai = 0 h dx := a: + 1]0 < ai < 2 (12) 



(R19) 

(R24) 



X = 0 h 0 <x <2 



(R19) 



I- 0<a:<2 



£C = 0 h ly := llO < a; < 2 
h [x ;= 0][y := 1]0 < a: < 2 (15) 



(R19) 

(R24) 



(R22) 



X = 1 \/ X ^ 2 h 0<ai<2 



(R19) — 



x' ^ 1 V x' = 2, X ^ 0 h 0<ai<2 



: = 1 V ai = 2 h |(c := OJO < ai < 2 (14) 



(R19) 

(R24) 



^ R19 R19 ^ R19 ^ R19 

(19) (20 (19') (20') ' 

(R24) ^ ^ ^ ^ ^ ^ (R24) i 

(17) (18) I 

^ (R32) j 

— — (R22) I 

(12) (13) (14) (15) ' 

(R20) (R20) ^ (R20) ^ (R20) 

* (8 (9) (10 (11 * 

* ^ (R28) (R28) (R2) 

(i) (i) (4) W 



X = 0, y ^ 1 h [while true do if 1 / = 1 then a else f3JO < x < 2 ( 1 ) 

a; = 0, y = 1 h Inv ( 2 ) 

Inv, true h [if y = 1 then a else P]Inv (3) 

Inv, true h [if y = 1 then a else /3]0 < x <2 (4) 

Inv, -<true h 0 < a; < 2. ( 5 ) 

Inv, true, y==l F [a;: = a;-|-l; ifa: = 2 then y : = 0 else J/ : = 1]0 < a; < 2 (6) 
Inv, true, -ly = 1 h [a; : = 0 ; y : = 1|0 < a: < 2 (7) 

Inv, true, y = 1 F [a;: = a;-|- 1]0 < x <2 (8) 

Inv, true, y = 1 F [x: = a;-|- l][if a; = 2 then y : = 0 else y : = 1]0 < x <2 ( 9 ) 

Inv, true, -ij/ = 1 F [a; : = 0]0 <x <2 ( 10 ) 

Inv, true, -^y ^ 1 F [x : = 0]|t/ := 1]0 < a; < 2. ( 11 ) 

x = 0 F [x : = x -I- 1]0 < X < 2 ( 12 ) 

x = 0 F [x: = x-|-l] [if X = 2 then y : = 0 else y : = 1|0 < x < 2 (13) 

x = lVx = 2 F [x: = 0]0<x<2 (14) 

F [x : = 0][j/ := 1]0 < X < 2 ( 15 ) 

x' = 0, X = x' -I- 1 F [if X = 2 then y ■■ = 0 else y: = l]0<x<2 (16) 

x' = 0, x = x'-Fl, x = 2 F [i/: = 0]0<x<2 (17) 

x' = 0, X = x' -I- 1, “'X = 2 F \y : = 1]0 < x < 2 (18) 

x' = 0, x = x'-|-l, x = 2 F 0<x<2 (19) 

x' = 0, x = x'-l-l, x = 2, y = 0 F 0<x<2 ( 20 ) 



Fig. 1. The derivation described in Section 6. 
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and (20) are valid FOL-sequents and can therefore be derived with the oracle 
rule for arithmetic (R19). 

Applying rule (R24) to (18) yields similar FOL-sequents (19') and (20'), 
which differ from (19) and (20) in that they contain = 2 instead of a; = 2 and 
y = 1 instead of y = 0. They, too, can be derived with the oracle (R19). 

7 Future Work 

Future work includes an implementation of our calculus CdlTi which would allow 
us to carry out case studies going beyond the simple examples shown in this 
paper and to test the usefulness of DLT in practice. 

A useful extension of Cdlt for practical applications may be special rules for 
formulas of the form [ajip A |a]i/') such that splitting the two conjuncts is avoided 
and they do not have to be handled in separate — but similar — sub-proofs. 

Also, it may be useful to consider (a) a non-deterministic version of DLT, 
and (b) extensions of DLT with further modalities such as “a preserves 
which expresses that, once (j) becomes true in the trace of a, it remains true 
throughout the rest of the trace. It seems, however, to be difficult to give a 
(relatively) complete calculus for this modality. 
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Abstract. In verification of finite domain models (model checking) 
counterexamples help the user to identify, why a proof attempt has failed. 
In this paper we present an approach to construct counterexamples for 
first-order goals over infinite data types, which are defined by algebraic 
specifications. The approach avoids the implementation of a new calcu- 
lus, by integrating counterexample search with the interactive theorem 
proving strategy. The paper demonstrates, that this integrations requires 
only a few modifications to the theorem proving strategy. 



1 Introduction 

It is common knowledge, that most of time in theorem proving is not spent 
on successful, but on unsuccessful proof attempts. Usually the reason is simply 
that the theorem under consideration is wrong. For data types with a finite do- 
main, the counterexamples generated by model checkers help the proof engineer 
to detect the errors. Also tools which generate finite models are available. Un- 
fortunately, there is no such mechanism for data types with an infinite domain. 
Additionally, proofs for infinite data types usually have to be found interactively 
and decisions made during the proof attempt may be incorrect. 

In this paper we will demonstrate a method which generates counterexamples 
for infinite data types in algebraic specifications. The method is implemented in 
the interactive theorem prover KIV [7,2] which supports algebraic specifications 
in the style of CAST [5] as well as state based specifications (using imperative 
programs or ASMs [3]). It avoids the need to implement a new calculus by 
adaption of the existing proof calculus to the construction of counterexamples. 

Our paper is organized as follows: Section 2 will first give an informal overview 
of the method, which is based on reduction of an unprovable goal to false. The 
capabilities of the algorithm will be demonstrated with an example and we will 
discuss the limitations of the approach. 

After introducing some necessary notation (Sect. 3), Sect. 4 describes how 
conjectures may be falsified by reduction, thereby proving the existence of a 
counterexample. The backtracing algorithm, that computes the actual variable 
assignment and the model condition is given in Sect. 5. A number of important 
implementation issues are discussed in section 6. Some examples and a compar- 
ison to related work is given in Sect. 7. The paper concludes with Sect. 8. 
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2 Overview 

To demonstrate the capabilities of the algorithm, let us start with an example: 

Example 1 

We assume a data type list over a parameter type elem. Lists are generated by 
the constructors nil for the empty list, and e+l, which adds an element e to a list 
1. The elements are assumed to be totally ordered with <. head and tail select 
e resp. I from a list e + l, length("IJ computes the length of a list, append(li,l2) 
appends two lists, member (e,l) tests, if the element e is in I, and sort(l) states 
that I is (nonstrictly) sorted with respect to the ordering on elements. Then the 
conjecture 

(p = sort{li) A I2 = append(/i, head{l^) + nil) A length^l^) > 2 * length{li) 

A I3 ^ nil A member {head{li),tail{l^)) -A sort{l2) 
does not hold in all models of the specification. A counterexample consists of the 
variable assignment h := e\+nil,l2 '.= e\ + C2+nil,l^ = 02 + append{l4^,e\ + lz) 
{e\, 62 are variables of type elem) and the model condition -■ Ci < 62- 

The example has explicitly been chosen to be somewhat contrived for two rea- 
sons: First, it should demonstrate several aspects of the counterexample search. 
Second, we want to stress, that formulas encountered during interactive proof 
attempts are even more complex than the example given above and counterex- 
amples are totally non-obvious. Our algorithm finds the counterexample within 
5 seconds, which is faster than we (and we hope the reader) can guess one by 
inspecting the goal. 

In contrast to a proof, where it is tried to reduce the conjecture to true, the 
counterexample search tries to reduce the conjecture to false. If this is successful, 
the conjecture is not provable and a counterexample exists. Because a conjecture 
is usually not false in general, we try to find a variable assignment under which 
the conjecture can be reduced to false. 

The variable assignment is usually built up by incrementally instantiating 
variables of generated sorts with constructors, resulting in constructor terms. A 
counterexample for Example 1 , which uses constructor terms only, would set I3 
to e2+ei+nil. Often this incremental construction can be shortcut by combining 
it with the application of proof rules. By imposing some restrictions (see Sect. 4 ) 
on the ones that may be used, the more abstract counterexample shown above 
can be derived. 

Incrementally instantiating a variable in a conjecture gives several subgoals, 
one for each constructor. Since each subgoal could lead to a counterexample, we 
have implemented a heuristic search strategy to choose a case, which (hopefully) 
quickly leads to a counterexample. 

Construction of a variable assignment terminates (at the latest) when all 
variables of generated sorts have been instantiated by constructor terms. The 
remaining formula may be false, but we may also end up with a model condition. 
Model conditions occur for three reasons: First, as in the example above, we may 
have parameter sorts, and the model condition may impose a restriction on the 
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models of the parameter: in the example above ei < 62 ensures that there 
exist two different elements in elem: in models, where elem contains just one 
element, the conjecture would be true. A second reason for model conditions 
is underspecification: E.g., if pred(O), the predecessor of zero is not specified 
(over natural numbers), then the counterexample search may finish with a model 
condition like pred(O) > 3. Finally, the interactive proof strategy may be too 
weak, to deduce automatically, that a fully instantiated goal is equivalent to false. 
This case usually arises, when not enough simplifier rules have been provided for 
the data structure under consideration. Since these would be needed to automate 
(successful) proofs anyway, we simply restart the counterexample search after 
adding the required rules. 

We have found that the success of counterexample generation depends mainly 
on the efficiency of the heuristic search strategy that computes variable as- 
signments. If the strategy is successful, we usually end up with trivial model 
conditions, like the ones shown above. Therefore we have not invested in the 
automation of the check, whether the model condition is consistent with the 
specification. Instead we simply ask the user to acknowledge satisfiability. 

After the original goal has been reduced to false (or a model condition) the 
variable assignment for the original goal has to be deduced. This is done by a 
backtracing algorithm, that analyzes the reduction backwards from false to the 
beginning of the counterexample search (see Sect. 5). Because the attempt to 
compute a counterexample will usually not be made on an initial conjecture, the 
counterexample will also be traced back further in the original proof attempt. 

3 Formal Basics 

The basis for our approach are first-order algebraic specifications, consisting of 
a finite, many-sorted signature with function symbols F, a finite set of axioms 
(first-order formulas) and a set of constructors C C F (typically, constructors 
are given by clauses like nat generated by 0,suc). A sort is called a target sort, 
if it is generated by some constructors, otherwise it is called a parameter sort. 
A constructor term is a term that uses only constructor functions and variables 
of parameter sorts. If syntactically different constructor terms always represent 
different elements, the data type is called a free or freely generated data type, 
nat is a free data type while int generated by 0, sue, pred is not free, because 
i = pred{suc{i)) . 

We assume loose semantics, i.e. an algebra A is in the class Mod(SP) of mod- 
els of a specification SP, iff the axioms hold in A and if every element contained 
in the carrier set of some target sort can be represented as the semantic value 
of a constructor term, given a suitable valuation for the parameter variables. 

In the following, we will use x, y as (metavariables for) variables, f,gas 
(constructor) function symbols r, p as terms, and p, ip, x Ets formulas. Free 
variables free{ip) are defined as usual. gen{ip) are the free variables of target 
sorts (also called target variables), param{(p) are all other free variables, 
denotes substitution in formulas. 
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Sequents F \- A consists of two lists F = Lpi, . . .,ipn and A = 'ipi,. . ijjm of 
formulas. F \- A abbreviates the formula A . . . A V . . . V f/im- 

We will use a sequent calculus for first-order logic (like the one defined in [13]) 
to derive proofs for conjectures. Proofs in this calculus are trees, in which every 
node contains a sequent. Starting with a proof tree of one node, which contains 
the initial goal c, a fixed, predefined set of proof rules of the form is 

applied to reduce the goal to simpler subgoals, c is called the conclusion, pi. . .p„ 
are called the premises of the rule. By recursive application of further rules to 
the resulting premises, a proof tree is built up. The conclusion c is proven if all 
premises in the tree are reduced to the trivially true premise true. 

Given these preliminaries on specifications and proofs, we can now define the 
central notion of a counterexample: 

Definition 1 (counterexample) 

Let ip be a formula over a specification SP with x = gen(i^). Then a pair C = 
(X; y = t) JS called a counterexample for the formula ip iff y is a vector of target 
variables, t is a list of terms such that yi ^ tj for i < j, gen(x) = 0 and there is 
a model A such that 

A G Mod{SP U Ch x) (1) 

and 

SP \= ix /\ y = t) ^ ^ if (2) 

A counterexample for ip is called a strong counterexample, if y Q x. 

The definition requests that there is a model A of the specification SP, in which 
X is satisfiable (1) and that for every model of the specification either x does 
not hold or -1 is true under the variable assignment y = t (2). The restriction 
yi ^ tj for i < j ensures that no cyclic dependencies between variables yi and 
terms tj exists. 

We can now prove that with this definition every non-provable formula has 
a counterexample: 

Theorem 1 

A formula ip is satisfiable over SP iff there exists a (strong) counterexample 
for ip. 



Proof 1 

If a formula -i ip is satisfiable, there exists a model A G Mod(SP) and a valuation 
V where A,v \= -< ip. Now, let gen(ip) = x. Since the model is generated we have 
constructor terms t and valuations Vi, such that Vi(xi) = Vi(ti). By suitably re- 
naming variables in t we can End a common valuation v' , such that A,v' \= -< ip 
and v'{xi) = v'{ti) for every i. Setting \ •= Cind C := (x, x = t), 

we have found a counterexample: the substitution theorem of first-order logic 
implies A, v' ^ x» ^cid this implies condition (1). Obviously, we also have 
SP \= (-> 3 L = t) ^ if, i.e. condition (2) is satisfied too. 
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Conversely, assume that a formula ip has a counterexample. Choose a model 
A G Mod{SP) and a valuation v such that A,v \= x- This is possible because 
of (1). Then the constraint yi ^ tj for i < j allows to sequentially modify 
v{yi ), . . . , v(jjn), such that for the resulting, modified valuation v' A, v' \= y = t 
holds. Since x does not contain target variables, we still have A, v' ^ x- Finally, 
property (2) implies A,v' ^ -i <p, i.e. -< p is satishable. 

The proof shows, that we could restrict the terms assigned to the variables to be 
constructor terms. We have not done so, to allow more abstract counterexamples 
as in Example 1. 

Finally we would like to note, that for efficiency reasons our algorithm will 
not construct strong counterexamples. Instead it will sometimes assign values to 
variables, that are only present in intermediate goals. This is not problematic, 
since it is easy to prove that any counterexample can be transformed to a strong 
counterexample by iterative application of the following post processing step: 

Lemma 1 

Let (x, y = t) be a counterexample for p with x = t € y = t and x ^ free(p). Let 
y' = t' be y = t with x = t removed. Then (x, iy' = is a counterexample 
for p too. 

4 Falsifying Conjectures 

We found that with a simple restriction sequent calculus is already suitable to 
do satisfiability reasoning instead of proving theorems. This restriction, called 
invertibility, will be discussed in the first subsection. A second advantage of 
using sequent calculus is that we do not have to use a brute-force search for 
the correct instantiation which is needed for the counterexample. Instead we use 
an incremental instantiation strategy called structural expansion as discussed in 
the second subsection. Finally, the last subsection exemplifies, how the combined 
proof strategy falsifies a conjecture. 



4.1 Existence of a Counterexample 

To find counterexamples we cannot use every rule of the sequent calculus due 
to the fact that some of them weaken the conjecture. This means that some 
rules reduce a provable conjecture (with no counterexample) to one which is no 
longer provable (and therefore has a counterexample). We call rules which do 
not weaken conjectures invertible rules. 

Definition 2 (invertible rule) 

A rule is called invertible, iff for every model A of the specification: 

from A \= c follows A \= Pi for 1 < i < n (3) 
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This means, that if the conclusion of a rule is valid in every model A of the 
specification then every premise pi is valid in that model. This condition is 
sufficient to show the existence of a counterexample. 

Lemma 2 (existence of a counterexample) 

Let T be a proof tree consisting of invertible rules, c the conclusion and p\. . .p„ 
the premises of T. 

If there exists an i, such that -• pt is satishable then there exists a counterex- 
ample for the conclusion c. 



Proof 2 

This lemma is proven by contradiction to the validity of the conclusion c. As- 
suming c is valid, definition 2 and transitivity imply that all pi are valid. This 
contradicts the existence of an i, such that -i pi is satishable. 

□ 



Fortunately, almost every rule of the sequent calculus is invertible, especially all 
rules that eliminate propositional connectives. The main exception is of course 
the rule weakening which drops formulas from the sequent. This rule must there- 
fore be avoided when searching for a counterexample. The rules all right (shown 
below, xq is a new variable) and exists left to eliminate quantifiers are also in- 
vertible. Two other critical rules are the quantifier instantiation rules all left and 
exist right. These rules are invertible only, if we use a version that does not drop 
the quantifier as shown below for all left . 



rLp%o,A 
T h V x.ip, A 



all right 



V A 

V x.ifi, r \- A 



all left 



4.2 Searching Counterexamples 

To compute the variable assignment for a counterexample, we could in principle 
use a generate and test algorithm, which would enumerate all constructor terms. 
E.g. for a formula involving two variables m, n over natural numbers, we could 
try to instantiate the variables with all pairs (smc*(0), smc^ (O)) of constructor 
terms. But this would be very inefficient. Consider e.g. a goal of the form tp (m,n) 
A m < n. Then for n = 0 it is totally redundant to consider any instance of m 
except 0, since the formula obviously has no counterexample for other cases of 
m. 

Therefore a more elaborate search for the existence of a counterexample is 
needed, which exploits constraints (like m < n) present in the conjecture to 
avoid the enumeration of as many cases as possible. Therefore the variables are 
instantiated stepwise with respect to the constructors and thereafter proof rules 
simplify the resulting formula. If, for example, variable n is instantiated with 0 
(and suc{ni)), then the conjecture can immediately be simplified to (^(0, 0) (by 
the fact m < 0 -fA m = 0), thereby avoiding any enumeration of instances of m. 
This method of stepwise instantiation is called structural expansion. 
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Definition 3 (structural expansion) 

Let ip be a formula with a free variable x and sort{x) = s generated by f\ ... fn- 
Furthermore let x\ . . .Xn be proper argument vectors for /i . . . /„ that are not 
free in (p. Then the step from p{x) to 

(TihiLi)) V ... V p{fn{xj)) 
is called structural expansion. 

Structural expansion spans a search tree and partly instantiates the free variables 
of the conjecture. Because the structural expansion stepwise enumerates every 
constructor term - for a variable every possible prefix is generated - the complete 
search space is generated. This is true because the data types are generated by 
the constructors, i.e. every value of a variable can be represented by a constructor 
term. The structural expansion may be included in the sequent calculus through 
an additional rule called constructor cut. 

Definition 4 (constructor cut) 

The constructor cut rule has the form 

(rhZ\){^^-i^ ... (ThA)t^-’'^ 

p ^ constructor cut 

where x € gen{p) and sort{x) = s generated by fi, . . .fn. 

To use this rule in the counterexample search it has to be correct and invertible. 



Lemma 3 (correctness and invertibility of constructor cut) 

The rule constructor cut (see Def. 4 ) is correct and invertible. 

Because the constructor cut rule is itself a weak form of the induction principle, 
the proof is easy using structural induction. 

Adding the constructor cut rule to the proof search and avoiding non invert- 
ible rules are the only changes needed to adapt the usual sequent calculus to 
counterexample search. 

To have a complete proof strategy in the sense, that every vector of construc- 
tor terms t is considered, we have to use a fair search strategy, that assures, that 
every branch is considered ultimately as well as every variable is ultimately ex- 
panded with the constructor cut rule. Then every constructor term instance of 
every variable will ultimately be tried as a potential counterexample. A simple 
fair search strategy would be breadth-first search on the branches of the proof 
tree, combined with expanding one of the variables that has been expanded the 
fewest times before. A more elaborate strategy is discussed in Sect. 6.2. 

4.3 Generating a Counterexample Proof 

We will now exemplify the counterexample search. 
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Example 2 

The following conjecture 

(fi = sort(^i) A sort(?2) — >■ sort(append(^i, ^2)) 

does not hold in all models of the specification. A counterexample consists of 
the variable assignment li := Ci + nil, I2 ■= 62 + nil and the model condition -1 

ei < 62 . 

The following proof tree gives an impression how the counterexample generation 
works. 

Example 3 



ei ^ 62 

sort(c2 + nil) — >■ sort (append (ei + nil, 62 + nil)) 

sort(c2 + I2) — >■ sort(append(ei + nil, 62 + I2)) 
sort(?2) — >■ sort ( append (ei +nil,l2)) 
sort(ei + nil) A sort(l2) — >■ sort (append (ci + nil, I2)) 

sort(ei + l[) A sort(l2) — >■ sort (append (ei + l'i,h)) 
sort(?i) A sort(/2) — >■ sort(append(?i, ^2)) 



■ simplify 

■ con. cut I2 
con. cut I2 
— simplify 

- con. cut I'l 
con. cut li 



Starting with the conjecture (conclusion of the proof tree) the rule constructor 
cut stepwise introduces the constructor terms which build up the counterex- 
ample. Dots indicate branches of the proof tree, which are not relevant to the 
outcome. Applications of constructor cut are alternated with applications of se- 
quent calculus rules. In our case only the simplifier rule is needed, which simpli- 
fies goals using rewrite rules: in the example the recursive definitions of append 
and sort are used as rewrite rules to simplify e.g. sort{ei + nil) to true. The 
search stops at the formula ei < 62, because there are no more target variables 
to instantiate and the predicate may not be automatically decided. Under the 
assumption that ei < 62 is satisfiable - which the user acknowledges to be 
true - there exists a counterexample for the conclusion. 



5 The Counterexample 

In section 4 we discussed how to prove the existence of a counterexample. In 
this section we will discuss how to reconstruct the variable assignment from a 
failed proof attempt, by tracing an initial variable assignments back through the 
proof attempt. The first subsection considers the proof generated by the coun- 
terexample search which uses invertible rules only whereas the second subsection 
extends the approach to arbitrary sequent calculus proofs. 
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5.1 Computing the Counterexample 

We will compute the counterexample backwards beginning at a premise ■;/'• Ac- 
cording to Sect. 4, this premise is either false, or a satisfiable formula without 
free target variables. In either case, C = f), 0) is a counter example. The task 

is now to compute a counterexample for every goal contained in the branch of 
the considered premise by backtracing through it. To adapt the counterexample 
from ip' to the previous formula ip in the backtrace path the rule which deduces 
ip' from ip has to be considered. Some rules of the sequent calculus have the 
property that the variable assignment has to be adjusted. We define rules where 
the assignment needs no adjustment strong invertible rules. 

Defiuitiou 5 (stroug iuvertible rule) 

A rule is called strong invertible, ifF A c — >■ pi A ... A Pn holds 

for every model A of the specification. 

It is easy to prove that this definition implies: 

Lemma 4 (backtrace through stroug iuvertible rules) 

Let be a strong invertible rule and C = (x, x = f)a counterexample for 

a premise Pi. Then C is also a counterexample for the conclusion c. 

In the sequent calculus most rules that are invertible, are also strong invertible. 
Especially all propositional rules, and the quantifier rules are strong invertible 
as well as the rule to apply a lemma. The only invertible rules, which are not 
strong invertible are insert equation, structural induction, and constructor cut. 

Example 4 (iusert equation) 

The insert equation rule 



— ^ ^ . insert equation 

x = T,r\- A ^ 



with X vars{T) is an invertible rule. Let 



h prime(4) 

1 — insert equation 

X = 4 h prime(x) 

be an instantiation of the rule where the predicate prime(x) is true iff x is a prime 
number. Because 4 is not prime there exists a counterexample C = {true, 0) with 
no conditions for this premise. But for a model A and the variable assignment 
v{x) = 3 the antecedent of the conclusion may be refuted, therefore in general 
C is not a counterexample for the conclusion. 
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The example shows, that the counterexample for the premise is not a counterex- 
ample for the conclusion. We have to adapt the counterexample by adding the 
dropped equation: C = {true,x = 4). In general the adaption for the three 
invertible (but not strong invertible) rules are as follows: 

Theorem 2 (adaption for insert equation) 

Let C = (x, X = t) be a counterexample for the premise of the insert equation 
rule. Then C = (x, (a; = A x = t) (where Xq is a new variable) is a 

counterexample for the conclusion. 

The structural induction is similar to the constructor cut rule (see Def. 4), but 
adds an induction hypothesis of the form V j/.(T — >■ for every Xi in x {y are 

all free variables of the conclusion FLA except the induction variable x). 

Theorem 3 (adaption for structural induction and constructor cut) 

Let C — (xj X — t) be a counterexample for the premise with fi(x^) of the 
structural induction or the constructor cut rule. Then C = (x, (x = t)^°) A x = 
fiisii))} where xq is a new variable is a counterexample for the conclusion. 

The correctness proofs for this adaption is not too difficult, details are given in 
[10]. In the following, we will shows the backtrace algorithm applied on Ex. 3. 

Example 5 

The backtrace begins at the premise. Because the negation of ei < 62 is satish- 
able the counterexample for the premise is C = (-i ei < 62, 0). At the simplifier 
rule no adaption is necessary, but at the constructor cut rule (I2 = nil) the 
counterexample is adapted to C = (-• ei < 62,^2 = stepwise further 

to C = (-1 Cl < 62, {I2 = nil,l2 = 62-1- I2)) which also holds for the premise of 
the simplification rule, C = (-• 61 < 62, {I2 = nil,l2 = 62 + hjl'i = nil)), and 
C = (-1 6i < 62, {I2 = nil,l2 = 62-1- hJi = nil,li = ei + t()). Finally, neither 
I'l nor I2 appear in the conclusion, the counterexample may be .simplified to the 
strong counterexample C = (= 61 < 62, (^2 = 62-1- nil,li = ei + nil)) using 
lemma 1. 

Because the search for a counterexample applies only invertible rules and the 
counterexample can be backtraced over these rules, a variable assignment for 
the free variables of disproved conjectures may be derived. 



5.2 Earliest Point of Failure 

In interactive proof systems the user does not only look at the conjecture he 
wants to prove, but also at formulas which will be derived during a proof. There- 
fore the counterexample search will usually not be started at the conjecture under 
consideration - why should one try to prove a conjecture assumed to be wrong? 
- but at some ‘fishy’ goal within the proof tree. If a counterexample can be found 
for that goal the users interest is whether some proof decisions were incorrect or 
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whether nevertheless the conjecture is faulty. Therefore the counterexample has 
to be backtraced in the original proof attempt tree as well. 

Because the counterexample search already uses the (restricted) proof rules, 
the backtrace algorithm of Sect. 5.1 may also be used for backtracing in the origi- 
nal proof, as long as only invertible rules have been used. For non-invertible rules 
(like weakening formulas) the best that may be done are two proof attempts: 
the first tries to show that the counterexample can be propagated through the 
rule (by proving (x A a; = t) — >■ -i (^), the second tries to show that it surely 
cannot be propagated (by proving (y A a; = t) — >■ (^). If the first proof attempt 
succeeds, backtracing can proceed. If the second attempt succeeds, the earliest 
point of failure has been found, i.e. application of the rule is at least one of the 
reasons why the proof attempt failed. In KIV two incomplete proof attempts are 
done just using the built-in simplifier (these do not cost much time). If both fail, 
more elaborate proof attempts are optional, but rarely used. Since wrong con- 
jectures are much more common than wrong proof decisions, one usually skips 
non-invertible rules and tries to compute a counterexample for the conclusion 
first. Only if this computed counterexample is found not to be an actual coun- 
terexample for the conclusion, because it was incorrectly propagated through a 
non-invertible rule, more elaborate checks are necessary. 

6 Implementation 

The previous sections have shown the approach for the computation of counterex- 
amples on infinite data types. This approach is tightly integrated in the proof 
engine of the KIV system. When the user proves a conjecture and gets stuck 
at some subgoal he might start the counterexample search. This strategy takes 
the subgoal and tries to reduce it to false with a special set of counterexample 
heuristics. These heuristics expand a free target variable structurally and sim- 
plify the resulting premises. Thereafter the premises are weighted (see Sect. 6.2) 
and a ‘good’ premise, i.e. one that hopefully quickly leads to a counterexam- 
ple, is chosen and structurally expanded again. This process continues as long 
as one premise has no more free target variables. If this premise is false or the 
user agrees, that the negation is satisfiable, the backtrace algorithm evaluates 
the variable assignment for the free target variables of the subgoal. Thereafter 
the counterexample may also be backtraced in the original proof attempt to the 
earliest point of failure (see Sect. 5.2). 

In this section we will first discuss implementation details that are specific 
to the the sequent calculus and the heuristics to automate proofs used in KIV. 
Second we will give some heuristics to get an efficient search strategy for the 
counterexample proofs and finally shortly discuss extensions to Dynamic Logic 
and to higher-order logic. 

6.1 KIV Specific Rules and Heuristics 

To simplify formulas with axioms and theorems from the underlying specifi- 
cation KIV uses a simplifier. The simplifier does propositional reasoning and 
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applies theorems from the specification as forward, elimination or conditional 
rewrite rules. Details on the various types of rules can be found in [9]. Additional 
heuristics do quantifier instantiations and unfolding of function definitions. 

To optimize the proof search for non-free data types, a variant of the struc- 
tural expansion rule is implemented. The modified rule adds preconditions 
to each premise. If variable x is expanded to /(x) in a premise, inequations 
/(x) yf Xi for each argument Xi € x that has the same sort as x are added. This 
ensures, that the constructor does not behave like identity on some argument. 
E.g. when expanding a variable s for a set to s'U{a}, the inequality s'U{a} yf s' 
is added, which ensures that the element a was not already present in s' . This 
halves the search tree because the case of the identity is omitted. 



6.2 Search Strategies for the Counterexample 

Instantiating variables using structural expansion has to be done in a fair way: 
Every premise and every variable of each premise has to be eventually consid- 
ered. KIV uses an A* search strategy to implement a fair search. The strategy 
computes a weight for each open goal and always continues the counterexample 
search at the branch with the smallest weight. To be fair, it has to be made 
sure, that there is a i5 > 0, such that the weight of goals increases at least by 5, 
when another rule is applied [11]. This is easily achieved by adding some weight 
proportional to the branch length. Further criteria rely on the assumption, that 
less complicated goals lead more quickly to a counterexample. 

Therefore the complexity is measured by considering the number of formulas 
and different variables of the goal, the number of times, the structural expansion 
was applied, and the complexity of the symbols corresponding to the specification 
hierarchy. KIV supports structured algebraic specifications [9] with the usual 
operations (union, enrichment, renaming, parameterization and actualization) 
which form this hierarchy. E.g. the specification of list (see Ex. 1) may be 
an enrichment of a list specification with the definition of the sort predicate, 
which may in turn be a generic specification with elem as parameter. Usually 
axioms, that define operations in specifications higher up in the hierarchy are 
based on symbols below in the hierarchy (like the definition of sort is based on 
elementary list constructors) . Therefore symbols higher up in the hierarchy carry 
more weight than those lower in the hierarchy. 

The expansion of a variable should reduce the complexity of the current 
formula. Therefore variables get a weight corresponding to the specification hi- 
erarchy of the sort, the number of occurrences (since expansion expands all 
occurrences), and, to get a fair strategy, a negative weight proportional to the 
number of times the variables have been expanded. Then the variable with the 
highest weight is chosen. 

The exact factors used in the weights were chosen by evaluating a number of 
experiments. They do a good job in practice. 
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6.3 Extending the Method 

Because the presented method for counterexample search uses a proof calcu- 
lus, the approach may easily be extended. The KIV system supports program 
verification of imperative programs in Dynamic Logic [8]. To extend the coun- 
terexample search to Dynamic Logic, we only had to check that the program 
rules (mainly rules for the symbolic execution of programs) are (strong) invert- 
ible. We found that almost all are strongly invertible, with only one exception: In 
Hoare’s invariant rule, the introduction of an incorrect program invariant leads 
to a not invertible rule. Therefore this rule is not used in the counterexample 
search for programs and backtracing through the proof stops at the invariant 
rule. The counterexample search for program verification is also implemented 
and works with good results. The basic strategy is to expand the input variables 
and then to execute the programs until the program formulas are reduced to pure 
predicate logic formulas. Then the usual counterexample search can proceed. 

Another extension of the proof calculus would be to apply it to higher-order 
goals (KIV already uses a higher-order logic). The new proof rules (beta reduc- 
tion and rules for the axioms of choice and extensionality) are not problematic, 
since they are strongly invertible, but higher-order variables present a problem, 
which we could not solve yet: since functions are not generated data types, we 
do not have a simple mechanism available, that can find instances of function 
variables. 



7 Related Work and Examples 

A large number of tools has been developed which try to construct finite models 
[12,15,14], even for first-order specifications [4]. Since our specifications usually 
do not have finite models at all, these tools are not directly applicable. Neverthe- 
less an interesting alternative to our approach would be to use them by defining 
suitable abstractions (e.g. by collapsing all data structures with more than a 
fixed number of constructors into a single element) . Model generator tools could 
also be used to check model conditions over parameter specifications, since these 
often can be satisfied with finite models. 

The idea to incrementally search for constructor terms to generate the search 
space for counterexamples also appears in [6] and in [1]. The first approach is 
restricted to the special case of initial specifications over free data types with 
complete recursive definitions. In this case the satisfiability of -i ip% can be de- 
cided by unfolding definitions and the inequality between syntactically different 
constructor terms. The special case allows to prove a completeness result for the 
counterexample search. 

The second approach in [1] is restricted to freely generated data types, and 
tries to integrate counterexample search with the calculus of a model generation 
prover. 

Both approaches differ from ours in that they try to develop a fully automatic 
calculus for the counterexample search, while we have developed an approach 




Flaw Detection in Formal Specifications 



655 



that is tightly integrated with the paradigm of interactive theorem proving. 
Nevertheless our approach is able to solve the problems given in [6] within a few 
seconds. No residual formula remain, the solutions are found automatically. 

Summarizing, we have developed a practically applicable approach to coun- 
terexample generation, which can deal with arbitrary data types. Below we will 
give a small number of examples for graphs. Graphs g are generated from the 
empty graph 0 by inserting nodes a and edges a i— >■ 6 with g a and g +e 
(a !->■ 6). Adding an edge (a !->■ b) implicitly adds both nodes a and b. The data 
type is not free, since g a a = g a. A path in a graph will be writ- 
ten [ai,...,a„]. Here are some conjectures over graphs and the corresponding 
counterexamples: 

1. every graph g is acyclic: 

C = {true, {g = 0-l-e(a >— >■ a))) 

2. every path x in a graph 5 is a shortest path (i.e. no path y is shorter): 

C = {true, {g = 0-l-e(a b),x = [a,b],y = [a])) 

3. if 5 has no edge of the form (a >->■ a), then it is acyclic: 

C = {a ytz b, {g = 0-|-e(a >->■ b)+e{b !->■ a))) 

4. two paths X and y in an acyclic graph g with same source and same desti- 
nation are equal: 

C = (oo yf oi A oi yf 02 A oo yf 02, {x = [oo,oi],y = [00,02,01], 
g = 0+e(ao fl2) + e(a2 ^ O'l)+e{O'0 ^i))) 

Our approach solves the first two examples automatically, the other two require 
that the user acknowledges the model condition. The examples are typical for 
KIV case studies, which often use abstract data types like the one above. They 
are atypical, since the goals are small and only one data type with only a few 
axioms is involved. More examples over larger specifications can be found in [10]. 

8 Conclusion 

We have presented an approach for the generation of counterexamples in alge- 
braic specifications. The counterexample search is based on a sequent calculus for 
first-order logic (and on Dynamic Logic for program verification). The approach 
extends other approaches limited to free data types, and has been implemented 
into the specification and verification tool KIV. 

The use of the existing proof calculus lead to a relative short implementation 
time, and allowed to reuse the already existing, well-tested and efficient strategies 
for theorem proving in the search for a counterexample. 

The counterexample mechanism already has been successfully applied in sev- 
eral applications as a supporting feature to aid the proof engineer in detecting 
flaws in conjectures. 

The applications have shown that the satisfiability of model conditions usu- 
ally is decided easily by the user. Nevertheless an interesting topic for further 
research would be to further automate this test. Some cases are easy to automate, 
e.g. terms like pred(O) or head(nil) could easily be checked to be unspecified by 
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inspecting the specification of natural numbers or lists. Model conditions for pa- 
rameters like 6i < €2 could be solved either by a decision procedure for total 
orders or by a model generation tool (finite models often are sufficient for this 
case). Care has to be taken, that the automation does not waste a lot of unnec- 
essary time, since usually the variable assignment computed is already sufficient 
to deduce why a conjecture is wrong. 

Another topic for further research are restrictions on the search space for non- 
free data types. Currently only the restriction mentioned at the end of Sect. 6.1 
has been implemented, but commutativity constraints like g +v a +v b = g +y 
b+y a (for graphs) or cancellation laws like n+ 1 — 1 = n (for integers) can also 
be exploited to reduce the search space. 
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Abstract. Redundancy elimination is a key feature for the efficiency of 
saturation based theorem provers. Ground joinability is a good candidate 
for a redundancy criterion but is rarely used in practice since the available 
algorithms are not believed to have a good cost-benefit ratio. In order to 
have a framework for the evaluation and the design of new methods for 
testing ground joinability we developed the system CCE. 



1 Introduction 

Redundancy elimination is a key feature for the efficiency of saturation based the- 
orem provers [BG94]. In equational theorem proving one therefore needs methods 
for detecting that an equation s = t is redundant w.r.t. {R,E, )^). Here de- 
notes a reduction ordering, R a rewrite system with R Q and E a set of 
equations which are not orientable by The stronger the chosen notion of re- 
dundancy is the harder it becomes to decide, it may even become undecidable. 
Ground joinability of s = t w. r. t. (i?, E, is a good candidate for such a cri- 
terion since it can be approximated to different degrees. One reason that it is 
rarely used at the moment might be that existing algorithms are not believed 
to have a good balance between computational cost and detection strength in 
practice. We therefore have developed the system CCE to have a framework 
for the evaluation of existing and the design of new redundancy tests based on 
ground joinability. 

In this paper we deal with the following problem: Given s = t and (R, E, )^), 
is s = t ground joinable (w.r.t. to the extended signature semantics)? If not, 
we are interested in a set {ci | ci, . . . , e„ | c„} of constrained equations that gives 
precise information for the reason why: Each ground instance (j(s) = a{t) that 
is not provably joinable in {R, E,)^) is covered by some | c^. Our system CCE 
(“Covering Constrained Equations”) computes such coverings for an equation 
s = t. If 0 is a covering of s = t then s = t is ground joinable. A non-empty 
covering may also be helpful for improving the efficiency of theorem proving. 
Note that testing ground confluence is in general computationally much easier 
than deciding whether 0 is a covering for s = t in case that {R, E, ;^) is not 
ground confluent (which is normally the case in theorem proving). 

A simple sufficient approach to this problem goes back to [MN90]. In [AHLOO] 
we experimented with that test for being a lexicographic path ordering (LPO) 
or a Knuth-Bendix ordering (KBO). We could show that this kind of redundancy 
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elimination can result in considerable speed-ups of equational theorem proving, 
especially if AC-operators are involved in the problem. 

The problem whether s = t is ground joinable in (R,E,y) is decidable if is 
an LPO [CNNR98]. The decision procedure is more involved than the method of 
[MN90] since it relies on a solver for LPO constraints. One may conjecture that 
the decision algorithm of [CNNR98] is too expensive to be useful as a redundancy 
criterion in theorem proving in practice. But it is natural to ask the following 
questions: 

1. What is the potential of this way of redundancy elimination, regardless of 
its cost? 

2. Can one weaken the decision procedure to get cheap sufficient tests for 
ground joinability? 

3. What is the trade-off between cost and sharpness of such weakenings? 

4. How do these weakenings compare with the method of [MN90]? 

Motivated by these questions we have implemented our system CCE. At 
the moment it contains various methods based on [CNNR98] and [MN90]. Fur- 
thermore, it provides some variants of a solver for LPO constraints based on 
the recent [NR99]. For KBO-constraints see [KVOO]. The system is written in 
ANSI-C and shares most of its code with our theorem prover Waldmeister 
[HJL99]. 

2 Notations 

We use standard notations whenever possible. We start with a fixed alphabet 
T with operators / G iF of fixed arity and a fixed set V of variables. Then 
Term(iF, V) is the set of terms and Term(iF) is the set of ground terms. Let 
= T yj {0,succ} be the extended signature where 0,succ ^ T and 0,succ of 
arity 0 and 1, respectively. Then a is an (extended) ground substitution if a{x) G 
Term(.F®) for all x G dom((j). An equation e is s = t with s,t G Term(.F, V). 
Let be a reduction ordering on Term(iF®, V) which is total on Term(iF®). For 
an LPO we require f >j= succ >;r 0 in the precedence for each f G E. We write 
{R, E, y) to denote sets of equations R and E with RC Then — >• = 
is the rewrite relation induced by R and by instances of equations in E that are 
orientable by )^. We write s j, t to denote joinability in (R^E,)^) and we write 
s jj. t if a{s) I a{t) for each ground substitution a. In this case s = t is called 
(extended) ground joinable. Then s = t is ground joinable w. r. t. each signature 
extension Eq of E. A constraint c is a Boolean expression (using A, V, -•) of 
atoms s = t and s > t where s, t G Term(iF, V) or c is T (denoting x = a;) or T 
(denoting ->x = x). We write cr ^ c if the ground substitution a satisfies c. A 
constrained equation has the form e | c. We call s = t\c trivial if either s = t or 
c is unsatisfiable. 

3 Coverings of Constrained Equations 

We first make precise what we mean by a covering. 
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Definition 1. Let e\c he a eonstrained equation where e is s = t. A covering of 
e\c is a set CE of eonstrained equations such that for each ground substitution 
a such that a \= c either a{s) f aft) or a{e) A cr(e') and a \= c' for some e' | d 
in CE. CE' is a refinement of CE if CE' is a covering for each e\c in CE. 

3.1 Computing Coverings with Constraints on Variables 

The idea of [MN90] is to consider all constraints of the form x,r(i) Qi a;,r(i+i) for 
alH = 1, . . . , n. Here Var(s, t) = {x\, . . . , x„} is the set of variables of s = t, tt is 
a permutation and Qi G {)^,=}. If s = t is joinable under all these constraints, 
then s = t is ground joinable. The appealing aspect of this method is that it is 
quite easy to implement both for LPO and KBO. The implementations of the 
orderings have to be slightly extended to admit the constraints on the variables. 
No solver for ordering constraints is needed. Unfortunately, the number of cases 
grows exponentially with the number of variables. We have therefore refined 
the method such that a small subset Vq C Var(s, t) is initially chosen and is 
successively extended if ground joinability could not be shown. This was essential 
to make the method fast in practice. A further drawback is that the chosen 
constraints may be too weak to make a variable comparable to another subterm. 

3.2 Computing Coverings with Arbitrary Constraints 

Employing full ordering constraints enables a more powerful test. The set of 
coverings can be refined incrementally by analyzing potential rewrite steps. In 
[CNNR98] a confluence tree is used to organize the refinements: 

CE U {e I c} h CE U CEq iff CEq results from e | c by 

(1) Constrained Rewriting or (2) Decomposition or (3) Instantiation. 

On a more concrete level of our implementation, we basically have three 
macros reflecting practical considerations. The first macro is Normalization, i. e., 
e I c is simplified to e' | c by e -^c d where — >-c means rewriting in (i?, E, where 
is enhanced by c as described in [NR99]. The second macro CRew is just 
Constrained Rewriting as in [CNNR98] which introduces new constraints. The 
third macro is Splitting e | c into | Ci for i = 1, ... ,n, where c, is in solved form. 
It combines Decomposition and Instantiation. 

We have implemented two strategies: In strategy CRew-Split macro CRew 
has priority over macro Splitting. In strategy Split-CRew it is vice versa. In both 
strategies the macro Normalization has highest priority. 

3.3 Constraint Solving 

The macro Splitting as well as the test for satisfiability is realized by a constraint 
solver. It is well known that checking LPO-constraints for satisfiability is NP- 
hard [N93]. So it is interesting to see how costly it is in practice. From the 
algorithms that are known from the literature we chose the method recently 
devised in [NR99]. It is there shown to be far superior in comparison to [N93]. 
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The algorithm works roughly as follows: The given constraint c is decomposed 
in a way which is analogous to the definition of the LPO. By a restricted form 
of transitivity new consequences are added. To ensure the termination of this 
process a special notion of redundancy is used which can also be used to detect 
early unsatisfiability. The resulting normal forms are called solved forms. The 
constraint c is satisfiable iff there is a solved form which contains no variable 
cycle such as {x > g{y)) A (y > h{x)). 

As is proposed in [NR99] we use a backtracking approach to implement this 
technique. This allows a rather quick test for satisfiability; continuing the back- 
track search enables collecting all cycle-free solved forms. This is important for 
our application. There are some variations in organizing the search, such as 
adding new consequences as early or as late as possible, applying the checks for 
redundancy/unsatisfiability or variable cycles in different frequencies and so on. 
A specialized memoization technique enables us to share such redundancy infor- 
mation between different branches of the search. Our implementation in ANSI-C 
is about ten times faster than the PROLOG-program of [NR99]. 

4 Experimental Evaluation 

For evaluating the different algorithms we used test-sets generated by Wald- 
MEISTER during proving 268 examples taken from the TPTP library [SS98] . The 
recorded equations had passed different redundancy tests and the prover would 
directly profit from showing them ground joinable. In Table 1 we give the data 
for six representative examples and for the whole test set. Besides the number 
of tests a problem contains we show the number of equations which could be 
shown to be ground joinable by any of the three methods. The figures given in 
per cent tell how many of them were found by the respective method. 

As we can see, the potential of ground joinability as a redundancy test lies 
between 2 and 30 per cent depending on the domain of the proof task. The 
method Split-CRew is in general stronger than CRew-Split which takes much 



Table 1. Comparison of three different methods for testing ground joinability: Variable 
constraints [MN90] vs. two variants of confluence trees [CNNR98], cf. Sect. 3.2 (running 
times for the tests in seconds on a SPARC UltraII/333 MHz). 



Problem 


number 
of all 
tests 


found 

ground 

joinable 


var. constraints 


Split-CRew 


CRew-Split 




strength 


time 


strength 


time 


strength 


time 


B00007-2 


2 097 


337 


82% 


1.5 


96% 


4.7 


90% 


36.4 


GRP187-1 


1599 


103 


63% 


1.2 


73% 


9.8 


61% 


41.0 


LAT023-1 


566 


119 


78% 


0.7 


92% 


2.0 


82% 


15.9 


RNG009-7 


978 


22 


18% 


0.1 


95% 


0.7 


86% 


1.5 


RNG027-5 


455 


61 


43% 


13.7 


69% 


9.3 


89% 


414.0 


ROB006-1 


1704 


512 


26% 


0.5 


96% 


5.6 


83% 


27.3 


268 Ex. 


109 632 


12011 


61% 


254.0 


83% 


999.0 


76% 


11264.0 
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more time. The exception is the domain of non-associative rings: Here CRew- 
Split is stronger at the price of a really large running time. By limiting the 
branching width or the depth of the confluence tree we made weakenings of 
both methods. This influenced especially the method CRew-Split: With a limit 
of 8 for the height of the tree the running time drops down to a quarter while 
the strength is still at 60% (considering all 268 problems). With a limit of 6 the 
strength is at 50% and the running time descends from over 11000 to about 
1 500 seconds For Split-CRew the influence is not so pronounced, but here too, 
with weakening the test the running time drops faster than the strength of the 
test. For example, limiting the branching width to 7 the strength is still at 77% 
with the running time going down from about 1 000 to 700 seconds. On the over 
all, the method of [MN90] has the best cost-benefit ratio, but in some domains 
it is rather weak. 

The constraint solver is pretty fast in practice, at least it can cope well 
with the problems generated by the modules using the solver. We made detailed 
measurements for the six examples of Table 1. More than 90% of the calls are 
finished within one millisecond and only a tiny fraction needs more than 10 
milliseconds. Analyzing the longer runs, we noted that there is still room for 
improvements. Especially, when the solver is used in the collecting mode and 
the investigated constraint is highly nonlinear, identical solved forms may be 
generated dozens of times. 
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1 Introduction 

RDL^ simplifies clauses in a quantifier-free first-order logic with equality using a 
tight integration between rewriting and decision procedures. On the one hand, 
this kind of integration is considered the key ingredient for the success of state- 
of-the-art verification systems, such as Acl2 [10], STeP [8], Tecton [9], and 
Simplify [7] . On the other hand, obtaining a principled and effective integration 
poses some difficult problems. Firstly, there are no formal accounts of the in- 
corporation of decision procedures in rewriting. This makes it difficult to reason 
about basic properties such as soundness and termination of the implementa- 
tion of the proposed schema. Secondly, most integration schemas are targeted 
to a given decision procedure and they do not allow to easily plug new decision 
procedures in the rewriting activity. Thirdly, only a tiny portion of the proof 
obligations arising in many practical verification efforts falls exactly into the 
theory decided by the available decision procedure. RDL solves the problems 
above as follows: 

1. RDL is based on CCR (Constraint Contextual Rewriting) [1,2], a formally 
specified integration schema between (ordered) conditional rewriting and a 
satisfiability decision procedure [11]. RDL inherits the properties of sound- 
ness [1] and termination [2] of CCR. It is also fully automatic. 

2. RDL is an open system which can be modularly extended with new decision 
procedures provided these offer certain interface functionalities (see [2] for 
details). 

In its current version, RDL offers ‘plug-and-play’ decision procedures for the 
theories of Universal Presburger Arithmetic over Integers (UPAI), Universal 
Theory of Equality (UTE), and UPAI extended with uninterpreted function 
symbols [13]. 

3. RDL implements instances of a generic extension schema for decision pro- 
cedures [3]. The key ingredient of such a schema is a lemma speculation 
mechanism which ‘reduces’ the validity problem of a given theory to the 

^ The system is available via the Constraint Contextual Rewriting Project Home Page 
at http: //www.mrg. dist .unige . it/ccr. 

R. Gore, A. Leitsch, and T. Nipkow (Eds.): IJCAR 2001, LNAI 2083, pp. 663—669, 2001. 

Springer- Verlag Berlin Heidelberg 2001 
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validity problem of one of its sub-theories for which a decision procedure is 
available. The proposed mechanism is capable of generating lemmas which 
are entailed by the union of the theory decided by the available decision 
procedure and the facts stored in the current context. Three instances of the 
extension schema lifting a decision procedure for UPAI are available. First, 
augmentation copes with user-defined functions whose properties can be ex- 
pressed by conditional lemmas. Second, affinization is a mechanism for the 
‘on-the-fly’ generation of lemmas to handle a significant class of formulae in 
the theory of Universal Arithmetic over Integers (UAI) . Third, a combination 
of augmentation and affinization puts together the flexibility of the former 
with the automation of the latter. Finally, RDL can be extended with new 
lemma speculation mechanisms provided these meet certain requirements 
(see [3] for details). 

Since extensions of quantifier-free first-order logic with equality are useful in 
practically all verification efforts, RDL can be seen as an open reasoning module 
which can be integrated in larger verification systems. In fact, most state-of- 
the-art verification systems feature similar components, e.g. Acl2’s simplifier, 
STeP validity checker, Tecton’s integration of contextual rewriting and a de- 
cision procedure for UPAI, and Simplify developed within the Extended Static 
Checking project. 

2 A Motivating Example 

Consider the problem of showing the termination of a function to normal- 
ize conditional expressions in propositional logic as described in Chap. IV of 
[5]. The argument in the proof of termination is based on exhibiting a mea- 
sure function that decreases (according to a given ordering) at each func- 
tion’s recursive call, ms (reported in [12]) is one such function: ms(a)=l and 
ms(lf(a;, y, z))=ms{x) + ms{x)ms{y)+ ms{x)ms{z) , where If is the ternary con- 
structor for non-atomic conditional expressions, a is an atomic conditional ex- 
pression, X, y, and 2 are conditional expressions, and juxtaposition denotes mul- 
tiplication. One of the proof obligation formalizing the ‘decreaseness’ argument 
above is 



ms{\f{u, lf(u, y, z), If (w, y, z))) < ms(lf(lf(u, v, w),y, z)), (1) 

where < is the ‘less-than’ relation over integers. In order to prove the validity of 
(1), we check the unsatisfiability of its negation. By rewriting the l.h.s. and the 
r.h.s. (of the negation of (1)) with the definition of ms, we obtain: 

u + uv + uvy + uvz + uw + uwy + uwz > , , 

u + uv + uw + uy + uvy + uwy + uz + uvz + uwz ' 

where u, v, y, w and z abbreviates ms{u), ms{v), ms{y), and ms{z), respectively. 
Then, we perform all the possible cancellations in (2) and we obtain: 

ms{u)ms{y) + ms{u)ms{z) < 0. 



(3) 
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We assume the availability of the following two facts: 

ms{E) > 0, (4) 

(X > 0 AF > 0) ^ xr > 0. (5) 

for each conditional expression E and for each pair of numbers X and Y. Then, 
consider the following two instances of (5), obtained by matching the conclusion 
of (5) with the first and second summand in the l.h.s. of (3): 

{ms{u) > 0 A ms{y) > 0) ms{u)ms{y) > 0 (6) 

{ms{u) > 0 A ms{z) > 0) ms{u)ms{z) > 0. (7) 

In order to relieve the hypotheses of (6) and (7), it is sufficient to instantiate (4) 
three times, namely ms{u) > 0, ms{y) > 0, and ms{z) > 0. Finally, it is trivial 
to detect the unsatisfiability of (3) and the conclusions of (6) and (7). 

Three (cooperating) reasoning capabilities are required to automate the 
above reasoning: (z) rewriting, {ii) satisfiability checking and normalization in a 
given theory, and {Hi) ground lemma speculation (in a sense that will be made 
clear later). The first is used to simplify formulae, e.g. unfolding the definition 
of ms in (1). The second presents two aspects: the simplification of a literal, 
e.g. canceling out common terms in (2), and the check for the unsatisfiability of 
(conjunctions of) literals, e.g. the conclusions of lemmas (6) and (7) with (3). 
The third is the capability of supplying instances of valid facts to (partially) in- 
terpret user-defined function symbols occurring in the current formula, e.g. two 
instances of (4) are used to relieve hypotheses of (6). 

3 Architecture 

RDL features a tight integration (based on CCR) of three modules implementing 
the reasoning capabilities mentioned above: a module for ordered conditional 
rewriting, a satisfiability decision procedure, and a module for lemma speculation. 

In the following, let cl be the clause to be simplified and p be a literal in cl 
which is going to be rewritten. The context C associated to p is the conjunction 
of the negation of the literals occurring in cl except p. Let T be the theory 
decided by the decision procedure. 

The decision procedure. For efficiency reasons, this module is state-based, in- 
cremental, and resettable [11]. The context C is stored by a specialized data 
structure in the state of the decision procedure. There are three functionali- 
ties. First, cs-unsat characterizes a set of inconsistent (in T) contexts whose 
inconsistency can be checked by means of computationally inexpensive checks. 
Second, given a literal I and the current context C, cs-simp computes the new 
context C resulting from the addition of ? to C in such a way that C" is entailed 
by the conjunction of I and C in T. Third, cs-norm computes a normal represen- 
tation p' of p w.r.t. T and the information stored in C. This functionality must 
be compatible with rewriting, i.e. it is required that p' ^ p where ^ denotes a 
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total term ordering on ground literals. As an example, a decision procedure for 
UPAI can implement cs-norm by collecting like terms in literals whose top-most 
predicate symbol is < and rewriting the resulting literal by using the equalities 
entailed by the current context. 

Constraint Contextual Rewriting. The rewriter provides the functionality ccr. 
It handles conditional rules of the form ft-i A • • • A {I = r), where I and r are 

terms, and are literals. Assume ra -< la for a ground substitution cr 

(otherwise, if la is different from ra, swap I with r in the following). Given p[la], 
ccr returns p[ra] if h\a, ..., /i„cr, and p[ra] are smaller (w.r.t. than p[Zcr], and 
for i = 1, ...,n either hia is (recursively) rewritten to true by invoking ccr^ or 
by checking whether hia is entailed by C (this is done by invoking cs-unsat so 
to check that the negation of hia is inconsistent with C). There are two other 
means of rewriting. Firstly, p is rewritten to false (true) if cs-unsat checks 
that p (the negation of p, resp.) is inconsistent with C. Secondly, p is rewritten 
to p' if p' has been obtained by invoking cs-norm. 

Lemma speculation. Three instances of the lemma speculation mechanism de- 
scribed in [3] are implemented in RDL. All the instances share the goal of feeding 
the decision procedure with new facts about function symbols which are other- 
wise uninterpreted in T. More precisely, they inspect the context C and return 
a set of ground facts entailed by C using T as the background theory. Further- 
more, these facts must enjoy some properties to ensure termination (see [3,2] for 
details). 

The simplest form of lemma speculation is augment [6, 1,2, 3], which consists 
of selecting and instantiating lemmas from a set of available valid formulae in or- 
der to obtain ground facts whose conclusions can be readily used by the decision 
procedure. As an example, consider a decision procedure for UPAI implemented 
by means of the Fourier-Motzkin method. Here the basic operation is to elim- 
inate one variable from a set of inequalities by means of a linear combination 
of two inequalities. Then, augment finds instances of the conclusions among the 
conditional lemmas which can promote further variable eliminations. There are 
two crucial problems. Firstly, we must relieve hypotheses of lemmas in order to 
be able to send their conclusions to the decision procedure. We solve this prob- 
lem by rewriting each hypothesis to true (if possible). This is done by invoking 
ccr and it implies that the rewriter and the decision procedure are mutually 
recursive. The other problem is the presence of extra variables in the hypotheses 
(w.r.t. the conclusion) of lemmas. RDL avoids this problem by requiring that 
the conclusion contains all the variables occurring in the lemma and that all 
the variables get instantiated by matching the conclusion of the lemma against 
the largest (according to -<) literal in C. As an example of how augment works, 
recall that (6) and (7) are generated by matching the conclusion of lemma (5) 
against (3) twice. 

If a suitable set of lemmas is defined, augment increases dramatically the 
effectiveness of the decision procedure. Unfortunately, devising such a suitable 

^ RDL performs no case splitting while releaving hypotheses of conditional lemmas. 
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set is a time consuming activity. This problem can be solved in some important 
special cases. In the actual version of RDL, affinize implements the ‘on-the- 
fly’ generation of lemmas about multiplication over integers. To understand how 
affinize works, consider the non-linear inequality X Y < —1 (where X and Y 
range over integers). By resorting to its geometrical interpretation, it is easy to 
verify that XY < — 1 is equivalent to {X > 1 A y < —1) V {X < — 1 A T > 1). 
To avoid case splitting, we observe that the semi-planes represented by X > 1 
and X < — 1 as well as those represented by F < — 1 and F > 1 are non- 
intersecting. This allows to derive the following four lemmas: X > 1 ^ Y < — 1, 
X<— 1=^F>1, F>1^X< —1, and F < — 1 X > 1. This process can 

be generalized to non-linear inequalities which can be put in the form XY < K 
(where K is an integer) by factorization. The generated (conditional) lemmas 
are used as for augment. 

On the one hand affinize can be seen as a significant improvement over 
augment since it does not require any user intervention. On the other hand it 
fails to apply when inequalities cannot be transformed into a form suitable for 
affinization. RDL combines augmentation and affinization by considering the 
function symbols occurring in the context (7, i.e. the top-most function symbol 
of the largest (according to ^) literal in C triggers the invocation of either 
affinization or augmentation. 

4 Experiments 

RDL must be judged w.r.t. its effectiveness in simplifying (and possibly checking 
the validity of) proof obligations arising in practical verification efforts where 
decision procedures play a crucial role. Hence, standard benchmarks for theorem 
provers (e.g. TPTP) are not in the scope of RDL. We are currently building 
a corpus of proof obligations extracted from the literature as well as examples 
available for similar components integrated in verification systems. The problems 
selected for the corpus are representative of disparate verification scenarios and 
are considered dijjicult for current state-of-the-art verification systems. 

Table 1 reports the results of our computer experiments. Problem lists the 
available lemmas^ (if any) and the formula to be decided, h is the binary relation 
characterizing the deductive capability of RDL (we have that h is contained in 
\=T, where T is the theory decided by the available decision procedure extended 
with the available facts) . The last column record the successful attempt (time is 
expressed in msec) to solve a problem by RDL.^ 

RDL solves problems 1 and 2 with a decision procedure for UTE. In the for- 
mer, the decision procedure is used to derive equalities entailed by the context 
which are used as rewrite rules and enable the use of the available lemma. The or- 
dered rewriting engine implemented by RDL is a key feature to successfully solve 
problem 2 since this form of rewriting allows to handle usually non-orientable 

® Capitalized letters denote implicitly universally quantified variables. 

Benchmarks run on a 600 MHz Pentium III running Linux. RDL is implemented in 

Prolog and it was compiled using Sicstus Prolog, version 3.8. 
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Table 1. Experimental Results 



# 


Problem 


RDL 


1 


/(A) = /(R)^(r(5(A,B),A) = A)h 
r{giy, z),x) = xV ^{g{x, y) = g{y, z)) V -.(y = x) 


26 


2 


A*B = B*A, {-•{C = 0)) ^ {rem{C * U, U) = 0) h 
remfy * z,x) = OV -■(x *y = z*y)\/x = 0 


109 


3 


(A > 0) {rem{A * B, A) = 0) h rem{x *y,x) = 0Vx<0 


12 


4 


min{A) < max{A) h 

-i(fc > 0) V -^{l > 0) V -'(Z < min(b)) V -■(0 < k) V 1 < max{b) + k 


12 


5 


{memb{A, B)) ^ {len{del{A, B)) < len{B)) h 
-i(w > 0) V -^{k > 0) V -'{z > 0) V -i{v > 0) V -<{memb{z, b)) 
V~<(w -1- len{b) < k) V w + len{del{z, b)) < k + v 


17 


6 


(0 < A) ^ (B < A * B), 0 < ms{C) h 

ms{c) + ms{d)^ + ms{b)^ < ms{c) -1- ms{b)^ + 2ms{d)^ * ms(b) + ms{d)‘*‘ 


72 


7 


A > 4 ^ (A^ < 2^) h -^(c > 4) V ^(6 < c^) V -.(2= < h) 


14 


8 


(mox(A, B) = A) ^ (mm(A, B) = B), {p{C)) ^ {f{C) < g{C)) h 
-i(p(x)) V -'{z < f{max{x,y))) V -i(0 < min{x,y)) V ->(x < max{x,y))\/ 
~'{max{x, y) < x)\/ z < g{x) + y 


114 


9 


0 < ms{C) h 

ms{c) + ms{d)^ + ms{b)^ < ms{c) -1- ms{b)^ + 2ms{d)^ * ms(b) + ms{d)‘*‘ 


63 


10 


h x>0^x'‘ — x + ljbO 


40 



rewrite rules such as A * B = B * A. RDL solves problem 3 with a decision 
procedure for UPAI. In fact, the available lemma is applied once its instantiated 
condition, namely x > 0, is relieved by the decision procedure (it is straightfor- 
ward to check the inconsistency of x > 0 and the literal x < 0 in the context). 
RDL solves problems 4, 5, 6, and 7 with a decision procedure for UPAI and 
augment. In particular, the formula of problem 6 is a non-linear formula whose 
validity is successfully established by RDL in a similar way of the example in 
Section 2. RDL solves problem 8 with the combination of a decision procedure 
for UPAI and for UTE. RDL solves problems 9 and 10 with the combination 
of a decision procedure for UPAI, augment and affinize. The lemma about 
multiplication (i.e. 0 < / J < / * J) is supplied in problem 6 but it is not 
in problem 9. Only the combination of augment and affinize can solve prob- 
lem 9. Finally, problem 10 shows the importance of the context in which proof 
obligations are proved (since RDL does not case-split). In fact, without x > 0 
augment and affinize would not be able to solve problem 10. This shows the 
importance of the context in which proof obligations are proved (since RDL does 
not case-split). 

As a matter of fact, the online version of STeP fails to solve all of the prob- 
lems reported in Table 1 . However, most of the problems are successfully solved 
by the improved version of STeP described in [4]. Simplify successfully solves 
problems 1 to 8 thanks to a Nelson-Oppen combination of decision procedure 
and an incomplete matching algorithm which is capable of instantiating (valid) 
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universally quantified clauses. However, it does not solve problems 9 and 10 since 
it is unable to handle non-linear facts without user-supplied lemmas (such as, 
e.g., 0</=i>J</*Jin problem 6). Finally, SVC fails to solve all the problems 
involving augmentation and affinization since it does not provide a mechanism to 
take into account facts which partially interpret user-defined function symbols. 
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Abstract. When Prolog programs that manipulate lists to manage a 
collection of resources are rewritten to take advantage of the linear logic 
resource management provided by the logic programming language Lolli, 
they can obtain dramatic speedup. Thus far this has been demonstrated 
only for “toy” applications, such as n-queens. In this paper we present 
such a reimplementation of the lean connection-calculus prover leanCoP 
and obtain a theorem prover for hrst- order classical logic which rivals or 
outperforms state-of-the-art provers on a signihcant body of problems. 



1 Introduction 

The development of logic programming languages based on intuitionistic [11] and 
linear logic [6] has been predicated on two principal assumptions. The first, and 
the one most argued in public, has been that, given the increased expressivity, 
programs written in these languages are more perspicuous, more natural, and 
easier to reason about formally. The second assumption, which the designers have 
largely kept to themselves, is that by moving the handling of various program 
features into the logic, and hence from the term level to the formula level, we 
would expose them to the compiler, and, thus, to optimization. In the end, 
we believed, this would yield programs that executed more efficiently than the 
equivalent program written in more traditional logic programming languages. 
Until now, this view has been downplayed as most of these new languages have 
thus far been implemented only in relatively inefficient, interpreted systems. 

With the recent development of compilers for languages such as A-Prolog [13] 
and Lolli [7], however, we are beginning to see this belief justified. In the case 
of Lolli, we are focused on logic programs which have used a term-level list as 
a sort of bag from which items are selected according to some rules. In earlier 
work we showed that when such code is rewritten in Lolli, allowing the elements 
in the list to instead be stored in the proof context -with the underlying rules 

* This paper reports work done while the second author was on a sabbatical-leave 
from Kobe University. 
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of linear logic managing their consumption- substantial speedups can occur. To 
date, however, that speedup has been demonstrated only on the execution of 
simple, “toy” applications, such as an n-queens problem solver [7]. 

Now we have turned our attention to a more sophisticated application: the- 
orem proving. We have reimplemented the lean Co P connection-calculus theo- 
rem prover of Often and Bibel [14] in Lolli. This “lean” theorem prover has 
been shown to have remarkably good performance relative to state-of-the-art 
systems, particularly considering that it is implemented in just a half-page of 
Prolog code. The reimplemented prover, which we call lolliCoP, is of comparable 
size, and, when compiled under LLP (the reference Lolli compiler [7]), provides a 
speedup of 40% over leanCoP. On many of the hardest problems that both can 
solve, it is roughly the same speed as the Otter theorem prover [8]. (Both lean- 
CoP and lolliCoP solve a number of problems that Otter cannot. Conversely, 
Otter solves many problems that they cannot. On simpler problems that both 
solve. Otter is generally much faster than leanCoPand lolliCoP.) 

While this is a substantial improvement, it is not the full story. LLP is a 
relatively naive, first-generation compiler and run-time system. Whereas, it is 
being compared to a program compiled in a far more mature and optimized 
Prolog compiler (SICStus Prolog 3.7.1). When we adjust for this difference, we 
find that lolliCoP is more than twice as fast as leanCoP, and solves (within a 
limited time allowance) more problems from the test library. Also, when the 
program is rewritten in Lolli, two simple improvements become obvious. When 
these changes are made to the program, performance improves by a further factor 
of three, and the number of problems solved expands even further. 



1.1 Organization 

The remainder of this paper is organized as follows: Section 2 gives a brief 
introduction to the connection calculus for first-order classical logic; Section 3 
describes the leanCoP theorem prover; Section 4 gives a brief introduction to 
linear logic, Lolli, and the LLP compiler; Section 5 introduces lolliCoP; Section 6 
presents the results and analysis of various performance tests and comparisons; 
and. Section 7 presents the two optimizations mentioned above. 

2 Connection-Calculus Theorem Proving 

The connection calculus [2] is a matrix proof procedure for clausal first-order 
classical logic. (Variations have been proposed for other logics, but this is its 
primary application.) The calculus, which uses a positive representation, proving 
matrices of clauses in disjunctive normal form, has been utilized in a number 
of theorem proving systems, including KoMeT [3], Setheo and E-Setheo [9, 
12]. It features two principal rules, extension and reduction. The extension step, 
which corresponds roughly to backchaining, consists of matching the complement 
of a literal in the active goal clause with the head of some clause in the matrix. 
The body of that clause is then proved, as is the remainder of the original clause. 
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r I 0 h ^1, . 



(start) 



(provided C P, C[t/x] — {A±, . . . , An} for some t, n > 0) r\n\- 

r I L..77 h ill,. . . ,Li„ C,r\ n\- Li,.. . . ,.,L 



(extensiouQ) 



( extensioni ) 

c,r \ n h Li,. . . ,Ln 

(provided C is ground, C = {Li,Ln^ . . . ,-Lim}, 1 < i < n, and m > 0) 



c,r I Li,n h Lii,.. ,,Lim c,r \ n h Li ,. . . 



(extension. 2 ) 



C, r I /7 h Li, . . . ,L„ 

(provided C is not ground, C\tjx\ — {L^, Ln, . . . , Lim} for some t, 1 < i < n, and m > 0) 



r \ Li,n h Li,. 



r \ Li,n h Li 



(reduction^ 1 i n) 



Fig. 1. A deduction system for the derivation relation of the Connection Calculus 

For the duration of the proof of the body of the matching clause, however, the 
literal that matched is added to a secondary data structure called the path. If 
at a later point the complement of a literal being matched occurs in the path, 
that literal need not be proved. This short-circuiting of the proof constitutes the 
reduction step. Search terminates when the goal clause is empty. Finally, note 
that in the extension step, if the clause matched is ground, it is removed from 
the matrix during the subproof. 

Figure 1 shows a deduction system for the derivation relation. Two versions 
of the extension rule are given, depending on whether the matched clause is 
ground or not. A third version handles the termination case. In the core rules of 
this system, the left-hand side of the derivation has two parts: the matrix, T, is 
a multiset of clauses; the path, II, is a multiset of literals. The goal clause on the 
right-hand side is a sequence of literals. Note that the calculus is more general 
than necessary. We can, without loss of completeness, restrict the selection of a 
literal from the goal clause to the leftmost literal (i.e., restrict i = 1). 

A derivation is a deduction tree rooted at an application of the start rule, for 
some positive clause C, with instances of extension^ and premiseless instances 
of reduction at the leaves. In an implementation, the choice of terms t in the 
start and extension2 rules would be delayed via unification in the usual manner. 
Otten and Bibel provide an alternate, isomorphic, formulation of the calculus by 
way of an operational semantics in which substitutions are made explicit [14]. 



3 The leanCoP Theorem Prover 

The leanCoP theorem prover of Otten and Bibel [14] is a Prolog program, shown 
in Figure 2, providing a direct encoding of the calculus shown in Figure 1. In 
this implementation clauses, paths, and matrices are represented as Prolog lists. 
Atomic formulas are represented with Prolog terms. A negated atom is rep- 
resented by applying the unary - operator to the corresponding term. Prolog 
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prove (Mat) prove (Mat , 1) . 
prove (Mat .PathLim) 

append(MatA, [ClalMatB] ,Mat) , \+member(-_,Cla) , 
append (Mat A, MatB, Mat 1) , prove ([!],[[-!! Cla] I Matl] , [] ,PathLim) . 
prove (Mat .PathLim) 

\+ground(Mat) , PathLiml is PathLim+1, prove(Mat,PathLiml) . 
prove ( 

prove ( [Lit I Cla] , Mat .Path, PathLim) 

(-NegLit=Lit ; -Lit=NegLit) -> 

( member_oc (NegLit .Path) ; 

append(MatA, [Clal iMatB] .Mat) , copy_term(Clal ,Cla2) , 
append_oc(ClaA, [NegLit IClaB] ,Cla2) , append (ClaA.ClaB.Cla3) , 

( Clal==Cla2 -> append (MatB, Mat A, Matl) 

; length (Path, K) , K<PathLim, 
append (MatB, [Clal I Mat A] .Matl) 

) , prove (Cla3 .Matl , [Lit I Path] .PathLim) 

), prove(Cla,Mat,Path,PathLim) . 

Fig. 2. The leanCoP theorem prover of Otten and Bibel 



variables are used to represent object variables. This last fact causes some com- 
plications, discussed below. 

The first evident difference between the calculus and its implementation is 
that an extra value, an integer path-depth limit, is added to each of the Prolog 
predicates. It is used to implement iterative deepening based on the maximum 
allowed path length, which is necessary to insure completeness in the first-order 
case, due to Prolog’s depth-first search strategy. When prove/ 1 is called, it sets 
the initial path limit to 1 and calls prove/2, which in turn selects (without loss 
of generality) a purely positive start clause. 

The selection of the clause, Cla, is done using a trick of Prolog: since the 
predicate append! A, B.C) holds if the list C results from appending list B to list 
A, append (A, [D|B] ,C) (in which [D I B] is a list that has D as it’s first item, 
followed by the list B) will hold if D is an element of C and if, further, A is the list 
of items preceding it and B is the list of items following it. Thus Prolog can, in 
one predicate, select an element from an arbitrary position in a list and identify 
all the remaining elements in the list, which result from appending A and B. 

This technique is used to select literals from clauses and clauses from matri- 
ces throughout leanCoP. While it is an interesting trick, it relies on significant 
manipulation and construction of list structures on the heap. It is precisely cer- 
tain uses of this trick which will be replaced by linear logic resource management 
at the formula level in lolliCoP. 

To insure that the selected clause is purely positive, the code checks that the 
clause contains no negated terms (terms of the form where the underscore is a 
wildcard). This is done using Prolog’s negation-as-failure operator: \+. Once this 
is confirmed, the proof is started using a dummy (unit) goal clause, ! , which will 
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cause the selected clause to become the goal clause in the next step. This is done 
to avoid duplicating some bookkeeping code already present in the general case 
in prove/4, which implements the core of the prover. Note that the similarity 
of appearance to the Prolog cut operator is coincidental. 

Should the call to prove/4 at the end of the first clause of prove/2 fail, then, 
provided this is not a purely propositional problem (That is, if it is not true that 
the entire matrix is ground.) the second clause of prove/2 will cause the entire 
process to repeat, but with a path-depth limit one larger. 

The first clause of prove/4 implements the termination case, extension^, 
and is straightforward. The second implements the remaining rules. This clause 
begins by selecting, without loss of completeness, the first literal, Lit, from the 
goal clause. If the complement of this literal as computed by the first line of the 
body of the clause matches a literal in the Path, then the system attempts to 
apply an instance of the reduction rule, jumping to the last line of the clause, 
where it recursively proves the remainder of the goal using the same matrix and 
path, under the substitution resulting from the matching process. (That is, free 
variables in literals in the goal and the path may have become instantiated.) 

If a match to the complement of the literal is not found on the path, that is, 
if all attempts to apply instances of reduction have failed, then this is treated as 
either extensioni or extension2, depending on whether or not the clause selected 
next is ground. A clause is selected by the technique described above. Then a 
literal matching the complement of the goal literal is selected from the clause. 
(If this fails then the program backtracks and selects another clause.) The test 
Clal==Cla2 is used, as explained below, to determine if the selected clause is 
ground, and the matrix for the subproof is constructed accordingly, either with 
or without the chosen clause. If the path limit has not been reached, the prover 
recursively proves the body of the selected clause under the new path assumption 
and substitution, and, if it succeeds, goes on to prove the remainder of the current 
goal clause. As the depth-first prover is complete for propositional logic, the path 
limit check is not done if the selected clause is ground. 

Note, P -> Q ; R is an extra-logical control structure corresponding to an 
if -then-else statement. The difference between this and ((P,Q) ; (\+P,R)) 
is that the latter allows for backtracking and retrying the test under another 
substitution, whereas the former allows the test to be computed only once and 
an absolute choice is made at that point. It can also be written without R, as is 
done in some cases here. Such use is, in essence, a hidden use of the Prolog cut 
operator, which is used for pruning search. 

As mentioned above, the use of Prolog terms to represent atomic formulas 
introduces complications. This is because the free variables of a term, intended 
to represent the implicitly quantified variables of the atoms, can become bound if 
the term is compared (unified) with another term. In order to avoid the variables 
in clauses in the matrix from being so bound, when a clause is selected from the 
matrix, a copy with a fresh set of variables is produced using copy_term, and 
that copy is the clause that is used. Thus, the comparison Clal==Cla2, which 
checks for syntactic identity, succeeds only if there were no variables in the 
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Fig. 3. A proof system for a fragment of linear logic 



original term Clal (since they would have been modified by copy_term), and, 
hence, if that term was ground. 

Because Prolog unification is unsound, as it lacks the “occurs check” for 
barring the construction of cyclic unifiers, if the prover is to be sound we must 
force sound unification when comparing literals. In Eclipse Prolog, used in the 
original leanCoP paper, this is done with a global switch, affecting all unification 
in the system. In SICStus Prolog, used for the tests in this paper, it is done 
with the predicate unify_with_occurs_check. This predicate is used within 
the member_oc and append_oc predicates, whose definitions have been elided in 
the code above. 

Many of these complications could have been avoided by using A-Prolog, 
which supports the use of A-terms as data for representing name-binding struc- 
tures, and whose unification algorithm is sound [11]. 

4 A Brief Introduction to Linear Logic Programming 

Linear logic was first proposed by Girard in 1987 [4]. Figure 3 gives a Gentzen 
sequent calculus for part of the fragment of intuitionistic linear logic which forms 
the foundation of the logic programming language Lolli, named for the linear 
logic implication operator, — o, known as lollipop. The calculus is not the standard 
one, but for this fragment is equivalent to it, and is easier to explain in the context 
of logic programming. In these sequents, the left-hand side has two parts: the 
context r holds assumptions that can be freely reused and discarded, as in 
traditional logics, while the assumptions in Z\, in contrast, must be used exactly 
once in a given branch of a tree. The two implication operators, =J>, and — o, are 
used to add assumptions to the unrestricted and linear contexts, respectively. In 
Lolli they are written => and -o. 

In the absence of contraction and weakening (that is, the ability to freely 
reuse or discard assumptions, respectively), all of the other logical operators 
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split into two variants as well. For example, the conjunction operator splits into 
tensor, ®, and with, &. In proving a conjunction formed with the current set 
of restricted assumptions, A, is split between the two conjuncts: those not used 
in proving the first conjunct must be used while proving the second. To prove 
a & conjunction, the set of assumptions is copied to both sides: each conjunct’s 
proof must use all of the assumptions. In Lolli, the 0 conjunction is represented 
by the familiar This is a natural mapping, as we expect the effect of a 
succession of goals to be cumulative: each has available to it the resources not 
yet used by its predecessors. The & conjunction, which is less used, is written 
. 

Thus, a query showing that two dollars are needed to buy pizza and soda 
when each costs a dollar can be written in Lolli as: 

?- (dollar -o pizza) => (dollar -o soda) => 

(dollar -o dollar -o (pizza, soda) ) 

which would succeed. In contrast, a single, ordinary dollar would be insufficient, 
as in the failing query: 

?- (dollar -o pizza) => (dollar -o soda) => (dollar -o (pizza, soda) ) 

If we wished to allow ourselves a single, infinitely reusable dollar, we would 
write: 

?- (dollar -o pizza) => (dollar -o soda) => (dollar => (pizza, soda) ) 
which would also succeed. Finally, the puzzling query: 

?- (dollar -o pizza) => (dollar -o soda) => (dollar -o (pizza & soda)) 

would also succeed. It says that with a dollar it is possible to buy soda and 
possible to buy pizza, but not both at the same time. 

It is important to note that while the implication operators add clauses to 
a program while it is running, they are not the same as the Prolog assert 
mechanism. First, the addition is scoped over the subgoal on the right of the 
implication, whereas a clause asserted in Prolog remains until it is retracted. 
So, for example, the following query will fail: 

?- (dollar => dollar), dollar. 

Assumed clauses also go out of scope if search backtracks out of the subordinate 
goal. Second, whereas assert automatically universalizes any free variables in 
an added clause, in Lolli clauses added with implication can contain free logic 
variables, which may get bound when the clause is used to prove some goal. 
Therefore, whereas the Prolog query: 

?- assert(p(X)) , p(a), p(b) . 

will succeed, because X is universalized, the seemingly similar Lolli query: 

?- p(X) => (p(a), p(b)). 

will fail, because the attempt to prove p(a) causes the variable X to become 
instantiated to a. If we desire the other behavior, we must quantify explicitly: 
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?- (forall X\p(X)) => (p(a), p(b)). 

What’s more, any action that causes the variable X to become instantiated will 
affect instances of that variable in added assumptions. For example, the query: 

?- p(X) => r(a) => (r(X) , p(b)). 

will fail, since proving r(X) causes the variable X to be instantiated to a, both in 
that position, and in the assumption p(X). Our implementation of lolliCoP will 
rely crucially on all these behaviors. 

Though there are two forms of disjunction in linear logic, only one, “©” is 
used in Lolli. It corresponds to the traditional one and is therefore written with 
a semicolon in Lolli as in Prolog. 

There are also two forms of truth, T, and 1. The latter, which Lolli calls 
“true”, can only be proved if all the linear assumptions have already been used. 
In contrast, T is provable even if some resources are, as yet, unused. Thus if a 
T occurs as one of the conjuncts in a © conjunction, then the conjunction may 
succeed even if the other conjuncts do not use all the linear resources. The T is 
seen to consume the leftovers. Therefore, Lolli calls this operator “erase” . 

It is beyond the scope of this paper to demonstrate the applications of all 
these operators. Many good examples can be found in the literature, particularly 
in the papers on Lygon and Lolli [5,6]. The proof theory of this fragment has 
also been developed extensively [6]. Of crucial importance is that there is a 
straightforward goal-directed proof procedure (conceptually similar to the one 
used for Prolog) that is sound and complete for this fragment of linear logic. 

5 The lolliCoP Theorem Prover 

Figure 4 gives the code for lolliCoP, a reimplementation of leanCoP in Lolli/LLP.^ 
The basic premise of its design is that, rather than being passed around as a list, 
the matrix will be loaded as assumptions into the proof context and accessed 
directly. In addition, ground clauses will be added as linear resources, since the 
calculus dictates that in any given branch of the proof, a ground clause should 
be removed from the matrix once it is used. Non-ground clauses are added to the 
intuitionistic (unbounded) context. In either case (ground or non-ground) these 
assumptions are stored as clauses for the special predicate cl/1. Literals in the 
path are also stored as assumptions added to the program. They are unbounded 
assumptions added as clauses of the special predicate path. While Lolli supports 
the A-terms of A-Prolog, LLP does not. Therefore, clauses are still represented as 
lists of literals, which are represented as terms as before. 

The proof procedure begins with a call to prove/ 1 with a matrix to be proved. 
This predicate first reverses the order of the clauses, so that when they are added 
recursively the resultant context will be searched in their original order. It then 
calls pr/1 to load the matrix into the unrestricted and linear proof contexts, 
as appropriate. First, however, it checks whether the entire matrix is ground 

^ Because the LLP parser is written in Prolog, LLP uses -<> for —o, rather than -o. 
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prove (Mat) 



reverse(Mat,Matl) , 

(ground(Mat) -> propositional => pr(Matl) 
; pr(Matl) 



). 



pr([]) 

pr( [ClalMat] ) 



p(l). 

(ground (Cla) 

). 



-> (cl (Cla) -<> pr(Mat)) 
; (cl (Cla) => pr(Mat)) 



p(PathLim) cl(Cla), \+member(-_,Cla) , 

copy_term(Cla,Clal) , prove (Clal.PathLim) . 



p(PathLim) \+propositional , 

PathLiml is PathLim+1, p(PathLiml) . 



prove ([],_) erase, 
prove ( [Lit I Cla] , PathLim) : - 
(-NegLit=Lit ; -Lit=NegLit) -> 

( path(NegLit) , erase ; 

cl(Clal), copy_term(Clal , Cla2) , append (ClaA, [NegLit I ClaB] ,Cla2), 
append (ClaA, ClaB, Cla3) , (Clal==Cla2 -> true ; PathLim>0) , 
PathLiml is PathLim-1, path(Lit) => prove (Cla3, PathLiml) 

) & prove (Cla, PathLim) . 



Fig. 4. The lolliCoP theorem prover 



or not. If it is, a flag predicate is assumed (using =>) to indicate that this is a 
propositional problem, and that iterative deepening is not necessary. 

The predicate pr/1 takes the first clause out of the given matrix, adds it to 
the current context as either a linear or unlimited assumption, as appropriate, 
and then calls itself recursively as the goal nested under the implication. Thus, 
each call to this predicate will be executed in a context which contains the 
assumptions added by all the previous calls. When the end of the given matrix 
is reached, the first clause of pr/1 calls p/1 with an initial path-length limit of 
1, so that a start clause can be selected, and the proof search begun. 

The clauses for p/1 take the place of the clauses for prove/2 in leanCoP. 
They are responsible for managing the iterative deepening, and for selecting 
the start clause for the search. A clause is selected just by attempting to prove 
the predicate cl/1 which will succeed by matching one of the clauses from the 
matrix which has been added to the program. This is significantly simpler than 
the process in leanCoP. Once the program finds a purely positive start clause, 
it is copied and its proof is attempted at the current path- length limit. Should 
that process fail for all possible choices of start clause, the second clause of p/1 
is invoked. It checks to see that this is not a purely propositional problem, and 
if it is not, makes a recursive call with the path-length limit one higher. 
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The predicate prove/2 takes the role of prove/4 in leanCoP; because the 
matrix and path are stored in the proof context, they no longer need to be 
passed around as arguments. The first clause, corresponding to extension^, here 
has a body consisting of the erase (T) operator. Its purpose is to discard any 
linear assumptions (i.e. ground clauses in the matrix) that were not used in this 
branch of the proof. This is necessary since we are building a prover for classical 
logic, in which assumptions can be discarded. 

The second clause of this predicate is, as before, the core of the prover, 
covering the remaining three rules. It begins by selecting a literal from the goal 
clause and forming its complement. If a literal matching the complement occurs 
as an argument to one of the assumed path/1 clauses, then this is an instance 
of the reduction rule and this branch is terminated. As with the extension^ rule, 
erase is used to discard unused assumptions. 

Otherwise, the predicate cl/1 extracts a clause from the matrix, which is 
then copied and checked to see if it contains a match for the complement of 
the goal literal. If the clause is ground or if the path-length limit has not been 
reached, the current literal is added to the path and prove/2 is called recursively 
as a subordinate goal (within the scope of the assumption added to the path) to 
prove the body of the selected clause. 

If this was an instance of the reduction rule, or if it was an instance of 
extensioni or extension 2 and the proof of the body of the matching clause suc- 
ceeded, the call to prove/2 finishes with a recursive call to prove the rest of the 
current goal clause. Because this must be done using the same matrix and path 
that were used in the other branch of the proof, the two branches are joined with 
a & conjunction. Thus the context is copied independently to the two branches. 

It is important to notice that, other than checking whether the path-length 
limit has been reached, there is no difference between the cases when the selected 
clause is ground or not. If it was ground, it was added to the context using linear 
implication, and, since it has been used (to prove the cl/1 predicate), it has 
automatically been removed from the program, and, hence, the matrix. Also, 
lolliCoP uses a different method for checking path length against the limit: the 
limit is simply decremented each time a literal is added to the path. This is done 
because there is no way to access the whole path to check its length, but has the 
advantage of being significantly more efficient as well. 

It is also important to note that, as mentioned before, we rely on the fact 
that free variables in assumptions retain their identity as logic variables and may 
become instantiated subsequently. In particular, the literals added to the path 
may contain instances of free variables from the goal clause from which they de- 
rive. Anything which causes these variables to become instantiated will similarly 
affect those occurrences in these assumptions. Thus, this technique could not be 
implemented using Prolog’s assert mechanism. In any case, asserted clauses 
are generally not as fast as compiled ones. 
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6 Performance Analysis 

We have tested lolliCoP on the 2200 clausal form problems in the TPTP library 
version 2.3.0 [15,8]. These consist of 2193 problems known to be unsatisfiable 
(or valid using positive representation) and 7 propositional problems known to 
be satisfiable (or invalid). Each problem is rated from 0.00 to 1.00 relative to 
its difficulty. A rating of “?” means the difficulty is unknown. No reordering of 
clauses or literals has been done. 

The tests were performed on a Linux system with a 550MHz Pentium III 
processor and 128M bytes of memory. The programs were compiled with version 
0.50 of LLP which generated abstract machine code executed by an emulator 
written in C. The time limit for all proof attempts was 300 seconds. 



Table 1. Overall performance of Otter, leanCoP, and lolliCoP 





Total 


Otter 


leanCoP 


lolliCoP 


lolliCoPz 


Solved 


2200 1602 (73%) 810 (37%) 822 (37%) 880 (40%) 


0 to < 1 second 




1209 


541 


554 


614 


1 to < 10 seconds 




142 


135 


124 


117 


10 to <100 seconds 




209 


93 


91 


94 


100 to <200 seconds 




31 


18 


25 


34 


200 to <300 seconds 




11 


23 


28 


21 


Problems rated 0.00 


1308 1230 (94%) 713 (55%) 716 (55%) 737 (56%) 


Problems rated >0.00 


733 


249 (34%) 


76 (10%) 


83 (11%) 118 (16%) 


Problems rated ? 


159 


123 (77%) 


21 (13%) 


23 (14%) 


25 (16%) 



The overall performance of Otter 3.1 (with MACE 1.4) [8,10], leanCoP [14], 
and lolliCoP, in terms of the number of problems solved, are shown in Table 1. 
The table also includes data for an improved version of lolliCoP, called lolliCoP2, 
discussed in the next section. The results for leanCoP were obtained in the same 
environment as those for lolliCoP, using SICStus Prolog 3.7.1, and are better 
than those reported by the authors [14]. The results for Otter 3.1 (with MACE 
1.4), which is not publicly available, are taken from a report by its developers 
[8]. These results were produced on a 400MHz Pentium II, which is somewhat 
slower than the machine we used. 

It is interesting to note that lolliCoP solved 57 problems, and lolliCoP2 77, 
which Otter can not solve. Most of these (47 for lolliCoP and 63 for lolliCoP2) are 
rated higher than 0.00. It should also be noted that lean Co P solved ten problems 
that neither lolliCoP nor lolliCoP2 solved. Since nine of these were rated 0.0, 
and given the structural similarities of the systems, we believe this to be due to 
serendipitous advantages with respect to clause ordering, since lean Co P orders 
clauses slightly differently. Fig. 5 depicts the overlap of problems solved by each 
system. 
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Fig. 5. Performance of Otter, leanCoP and lolliCoP classified by problem rating 



6.1 Performance Comparison 

In order to produce a more detailed comparison, we tested all the systems on the 
118 problems rated greater than 0.0 which lolliCoP 2 can solve. Because Otter 

3.1 is not yet available, we used Otter 3.0.6 instead. All tests were made on the 
same 550MHz Pentium III. Table 2 gives the results of this comparison. (Otter 
results labeled “error” refer to an empty set-of-support.) 

As mentioned in the introduction, although the table shows lolliCoP as almost 
consistently outpacing lean Co P these results do not tell the entire story. Because 
LLP is a first-generation implementation, the code generator is not nearly as 
sophisticated as SICStus’, nor is its runtime system. To adjust for this factor we 
also executed a version of lean Co P using the LLP compiler and runtime system 
(since Lolli is a superset of Prolog). In this test, looking only at the problems 
that it succeeded in solving, leanCoP took 2.3 times as long as lolliCoP, providing 
a more accurate measure of the benefits accrued from the logical treatment. 

Table 3a compares the performance of all four systems on the 33 problems 
that they can all solve. Total CPU time is shown, along with a speedup ratio 
relative to leanCoP (under SICStus). On just these problems, lolliCoP has almost 
the same performance as Otter. However, comparing the result of 36 problems 
solved by both Otter and lolliCoP, Otter is 71% faster as shown in Table 3b. 
Finally, Table 3c shows a similar analysis for the 76 problems that lolliCoP and 
leanCoP can both solve. 

7 Improvements to the lolliCoP Prover 

In the design of leanCoP, Often and Bibel seem to have been focused primarily on 
keeping the code as short as possible. In the process of reimplementing the system 
in Lolli, a simple but significant performance improvement became apparent, 
which we discuss here. 

The most obvious inefficiency in the system as described thus far is that 
copy_term is called in order to create a new set of logic variables in a selected 
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Table 2. Problems solved by I 0 INC 0 P 2 and rated higher than 0.00 



Problem 


Rating Otter 


leanCoP 


lolliCoP lolliCoP 2 


Problem 


Rating 


Otter leanCoP 


lolliCoP lolliCop 2 


BOO012-1 


(0.17) 


3.44 


8.13 


7.33 


1.28 
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(0.12) 


3.40 
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49.37 


4.44 
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17.39 237.81 
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9.43 
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0.43 


0.28 


0.20 


0.17 
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2.67 
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231.07 
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0.89 
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129.91 
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(0.50) 


>300 


15.83 


12.05 


4.26 


PLA004-1 


(0.40) 


>300 


4.00 


3.06 


2.46 


CAT003-3 


(0.11) 


>300 


2.43 


1.70 


0.34 


PLA004-2 


(0.40) 


>300 


5.97 


5.09 


3.90 


CAT012-4 


(0.17) 


0.26 


19.95 


14.88 


4.57 


PLA005-1 


(0.40) 


>300 


0.44 


0.36 


0.24 


COL002-3 


(0.33) 


>300 


0.01 


0.03 


0.01 


PLA005-2 


(0.40) 


>300 


0.10 


0.06 


0.03 


COL075-1 


(0.50) 


>300 


>300 275.77 


60.29 


PLA007-1 


(0.40) 


>300 


0.14 


0.13 


0.08 


PLD002-3 


(0.67) 


1.20 201.23 


162.01 


43.73 


PLA008-1 


(0.40) 


>300 


251.50 


204.21 


142.95 


PLD003-1 


(0.67) 


>300 


>300 264.10 


70.76 


PLA009-1 


(0.40) 


>300 


0.06 


0.05 


0.03 


FLD004-1 


(0.67) 


>300 


>300 


>300 


201.74 


PLA009-2 


(0.40) 


>300 


2.10 


1.73 


1.20 


FLD009-3 


(0.33) 


>300 


>300 


272.94 


72.59 


PLAOlO-1 


(0.40) 


>300 


250.53 


203.04 


142.21 


FLD013-1 


(0.67) 


>300 


0.46 


0.48 


0.15 
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(0.40) 


>300 


0.14 


0.09 


0.05 


FLD013-2 
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>300 


>300 


>300 


106.83 


PLAOll-2 


(0.40) 


>300 


0.45 


0.36 


0.24 


FLD013-3 


(0.33) 


>300 


>300 


>300 


153.85 


PLA012-1 


(0.40) 


>300 


66.06 


52.21 


36.79 


FLD013-4 


(0.33) 


3.19 


>300 


>300 


270.74 


PLA013-1 


(0.40) 


>300 


0.24 


0.18 


0.11 


FLD016-3 


(0.33) 


11.90 


>300 


>300 


155.71 


PLA014-1 


(0.40) 


>300 


2.06 


1.61 


1.21 


FLD018-1 


(0.33) 


>300 


>300 


>300 


101.86 


PLA014-2 


(0.40) 


>300 


2.13 


1.75 


1.33 


FLD019-1 


(0.33) 


>300 


>300 


>300 


196.80 


PLA016-1 


(0.40) 


>300 


0.07 


0.07 


0.04 


FLD022-3 


(0.33) 


12.45 


>300 


>300 


155.83 


PLA019-1 


(0.40) 


>300 


0.06 


0.05 


0.03 


FLD023-1 


(0.33) 


>300 


0.62 


0.48 


0.13 


PLA021-1 


(0.40) 


>300 


0.18 


0.13 


0.07 


FLD025-1 


(0.67) 


>300 


0.45 


0.49 


0.15 


PLA022-1 


(0.40) 


>300 


0.40 


0.32 


0.24 


FLD025-3 


(0.33) 


>300 


>300 


>300 


130.73 


PLA022-2 


(0.40) 


>300 


0.03 


0.02 


0.01 


FLD028-3 


(0.33) 


13.45 


>300 


>300 


187.66 


PLA023-1 


(0.40) 


>300 


72.74 


57.73 


40.71 


FLD030-1 


(0.33) 


0.41 


0.03 


0.02 


0.01 


PUZ034-1.004 


(0.67) 


error 


15.87 


12.42 


9.83 


FLD030-2 


(0.33) 


>300 


0.44 


0.35 


0.11 


RNG006-2 


(0.20) 


4.69 


0.26 


0.35 


0.06 


FLD031-1 


(0.33) 


>300 


>300 


>300 


268.48 


RNG040-1 


(0.11) 


0.05 


0.01 


0.01 


0.01 


FLD032-1 


(0.33) 


>300 


>300 


>300 


247.57 


RNG040-2 


(0.22) 


0.10 


0.21 


0.19 


0.04 


FLD035-3 


(0.33) 


14.11 


>300 


>300 


257.05 


RNG041-1 


(0.22) 


0.16 


43.86 


36.25 


6.37 


FLD036-3 


(0.33) 


13.73 


>300 


>300 


135.22 


SET014-2 


(0.33) 


176.24 


174.31 


134.35 


27.97 


FLD037-1 


(0.33) 


>300 


1.64 


1.25 


0.32 


SET016-7 


(0.12) 


>300 


10.99 


8.29 


1.05 


FLD060-1 


(0.67) 


>300 


0.59 


0.51 


0.15 


SET018-7 


(0.12) 


>300 


11.13 


8.37 


1.06 


FLD060-2 


(0.67) 


>300 


>300 


>300 


127.11 


SET041-3 


(0.44) 


>300 


59.88 


45.36 


4.88 


FLD061-1 


(0.67) 


>300 


0.66 


0.58 


0.17 


SET060-6 


(0.12) 


0.19 


0.04 


0.03 


0.00 


FLD061-2 


(0.67) 


>300 


>300 


>300 


155.31 


SET060-7 


(0.12) 


0.33 


0.05 


0.03 


0.00 


FLD064-1 


(0.67) 


>300 


>300 


>300 


114.86 


SET083-7 


(0.12) 


24.39 


40.34 


34.70 


5.41 


FLD067-1 


(0.33) 


>300 


1.47 


1.20 


0.31 


SET085-6 


(0.12) 


12.72 


>300 


>300 


65.58 


FLD067-3 


(0.33) 


20.08 


186.68 


150.37 


40.97 


SET085-7 


(0.25) 


65.79 


46.01 


33.55 


5.22 


FLD069-1 


(0.33) 


>300 


>300 


>300 


125.96 


SET119-7 


(0.25) 


177.97 


60.35 


48.50 


6.71 


FLD070-1 


(0.33) 


>300 


2.52 


0.68 


0.18 


SET120-7 


(0.25) 


181.62 


60.23 


48.46 


6.71 


FLD071-3 


(0.33) 


2.52 


0.36 


0.34 


0.08 


SET121-7 


(0.25) 


178.42 


72.81 


55.77 


7.63 


GEO026-3 


(0.11) 


2.15 


20.34 


19.16 


2.35 


SET122-7 


(0.25) 


180.13 


72.83 


55.82 


7.64 


GE0030-3 


(0.44) 


8.04 


>300 271.72 


30.90 


SET152-6 


(0.12) 


0.45 


3.50 


2.60 


0.38 


GEO032-3 


(0.25) 


1.16 


>300 


292.07 


32.02 


SET153-6 


(0.12) 


>300 


0.70 


0.56 


0.10 


GEO033-3 


(0.38) 


4.81 


>300 


>300 


39.41 


SET187-6 


(0.38) 


>300 


18.01 


13.53 


2.27 


GEO041-3 


(0.22) 


0.21 


42.28 


32.90 


3.60 


SET196-6 


(0.12) 


10.59 


>300 


>300 


196.13 


GEO051-3 


(0.25) 


7.26 


>300 


>300 


56.82 


SET197-6 


(0.12) 


10.63 


>300 


>300 


196.06 


GEO064-3 


(0.12) 


0.33 


>300 


>300 


55.07 


SET199-6 


(0.25) 


>300 


>300 


>300 


203.63 


GEO065-3 


(0.12) 


0.34 


>300 


>300 


55.11 


SET231-6 


(0.12) 


>300 


12.86 


9.74 


1.63 


GEO066-3 


(0.12) 


0.32 


>300 


>300 


55.14 


SET234-6 


(0.25) 


>300 


>300 


>300 


251.18 


GRP008-1 


(0.22) 


0.69 


1.00 


0.72 


0.12 


SET252-6 


(0.25) 


61.53 


>300 


>300 


202.60 


HEN007-6 


(0.17) 


0.12 


>300 


>300 


211.72 


SET253-6 


(0.25) 


>300 


>300 


>300 


203.24 


LCL045-1 


(0.20) 


98.03 


1.31 


0.90 


0.50 


SET451-6 


(0.12) 


>300 


>300 


>300 


281.67 


LCL097-1 


(0.20) 


0.26 


0.67 


0.20 


0.12 


SET553-6 


(0.25) 


36.81 


>300 


>300 


204.46 


LCLlll-1 


(0.20) 


0.13 


0.20 


0.14 


0.07 


SYN048-1 


(0.20) 


0.00 


0.00 


0.00 


0.00 


LCL130-1 


(0.20) 


0.01 


0.03 


0.01 


0.02 


SYN074-1 


(0.11) 


0.87 


>300 


>300 


74.43 


LCL195-1 


(0.20) 


error 


18.76 


15.23 


6.93 


SYN075-1 


(0.11) 


0.17 


>300 266.47 


49.18 


LCL230-1 


(0.40) 


error 


209.09 


133.76 


61.13 


SYN102-1. 007:007 


(0.33) 


1.00 


39.38 


39.70 


22.95 


LCL231-1 


(0.40) 


error 


>300 189.14 


85.31 


SYN311-1 


(0.20) 


error 


123.36 


99.86 


45.68 



clause, even when the clause is ground, since that test is not made till later on. 
Given the size of some of the clauses in the problems in the TPTP library, this 
can be quite inefficient. While the obvious solution would be to move the use 
of copy_term into the body of the if-then-else along with the path-limit check, 
Lolli affords a more creative solution. 

In lolliCoP we already check whether each clause is ground or not at the 
time the clauses are added into the proof context in pr/1. We can further take 
advantage of that check by not only adding the clauses differently, but by adding 
different sorts of clauses. In lolliCoP a clause c (ground or not) is represented 
by the Lolli clause cl(c). We can continue to represent ground clauses in the 
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Table 3. Comparison of Otter, leanCoP, and lolliCoP 



(a) 33 problems solved by Otter, leanCoP, and lolliCoP 





Otter 


leanCoP 


lolliCoP 


lolliCoPa 


Total CPU time 


1143.03 


1590.66 


1139.41 


338.47 


Average CPU time 


34.64 


48.20 


34.53 


10.26 


Speedup Ratio 


1.39 


1.00 


1.40 


4.70 



(b) 36 problems solved by Otter and lolliCoP (c) 76 problems solved by leanCoP and lolliCoP 





Otter 


lolliCoP 


lolliCoPa 




leanCoP 


lolliCoP lolliCoPa 


Total CPU time 


1152.40 


1969.67 


450.57 


Total CPU time 


2757.83 2038.58 


853.24 


Average CPU time 


32.01 


54.71 


12.52 


Average CPU time 


36.29 


26.82 


11.23 


Speedup Ratio 


1.71 


1.00 


4.37 


Speedup Ratio 


1.00 


1.35 


3.23 



same way, but when c is non-ground, instead represent it by the Lolli clause: 
cl (Cl) copy_term(c, Cl) . When this clause is used, it will return not the 
original clause, c, but a copy of it. To be precise, we replace the second clause 
of pr/1 with a clause of the form: 

pr([C|Mat]) 

(ground(C) -> (cl(C) -<> pr(Mat) 

; (forall Cl\ cl(Cl) copy_term(C,Cl) ) => pr(Mat)). 

Note the use of explicit quantification over the variable Cl. 

In lolliCoP 2 the loaded clauses are further modified to take a second parame- 
ter, the path-depth limit. The Lolli clauses for ground clauses simply ignore this 
parameter. The ones for non- ground clauses check it first and proceed only if the 
limit has not yet been reached. In this version of the prover there is no check 
whatsoever for the ground status of a clause in the core (prove/2). This removes 
the potentially significant computational cost of checking the ground status each 
time a clause is selected: an operation linear in the size of the selected clause. 
Space constraints keep us from including the full program. 

Taken together these small improvements actually triple the performance of 
the system. While the first optimization can be added, awkwardly, to leanCoP, 
it is not possible to do away entirely with the groundness check in that setting. 

8 Conclusion 

Lean theorem proving began with leanTAP [1], which provided an existence proof 
that it was possible to implement interesting theorem proving techniques using 
clear short Prolog programs. It was not expected, however, to provide particu- 
larly powerful systems. Recently, leanCoP showed that these programs can be at 
once perspicuous and powerful. 

However, to the extent that these programs rely on the use of term-level 
Prolog data structures to maintain their proof contexts, they require the use of 
list manipulation predicates that are neither particularly fast nor clear. In this 
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paper we have shown that by representing the proof context within the proof 
context of the meta-language, we can obtain a program that is at once clearer, 
simpler, and faster. 

Source code for the examples in this paper, as well as the LLP compiler can 
be found at http://www.cs.hmc.edu/~hodas/research/lollicop. 
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1 Introduction 

The Muscadet theorem prover is a knowledge-based system. It is based on nat- 
ural deduction, following the terminology of Bledsoe ([1], [2]), and uses methods 
which resemble those used by humans. It is composed of an inference engine, 
which interprets and executes rules, and of one or several bases of facts, which 
are the internal representations of “theorems to be proved” . 

Rules are either universal and put into the system, or built by the system 
itself by metarules from data (definitions and lemmas) given by the user. They 
are in the form if <list of conditions> , then <list of actions>. Actions may be 
“super-actions” which are defined by packs of rules. 

The representation of a “theorem to be proved” (or a sub-theorem) is a 
description of its state during the proof. It is composed of objects that were 
created, of hypotheses, of a conclusion to be proved, of rules called active rules, 
possibly of sub-theorems, etc. At the beginning, it is only composed of a conclu- 
sion, which is the initial statement of the theorem to be proved, and of a list of 
“active” rules, relevant for this theorem, and which were built automatically. 

Rules may add new hypotheses, modify the conclusion, create objects, create 
sub-theorems or build new rules which are local for a (sub-)theorem. A theorem is 
proved, for example, if the conclusion to be proved was added as a new hypothesis 
or if there is an existential conclusion 3Xp{X) and a hypothesis p{a). 

2 Example: Transitivity of Inclusion 

Prove the transitivity of inclusion, that is the theorem 

VAVRVC'(A cBABcC^AcC) 
with the definition of inclusion A C B VA(A G A ^ X G B) 

To prove this theorem Muscadet creates objects a, b and c by applying three 
times the rule 

Rule \f: if the conclusion is 'iXp{X), 

then create a new object x and the new conclusion is p{x) 
and the new conclusion is aGbAbGc^aGc. Then the rule 
Rule =^.‘ if the conclusion is H ^ C, 

then add the hypothesis H and the new conclusion is C 
replaces the conclusion by a C c and adds the two hypotheses aGb and b G c . 
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In effect, hypotheses H are analyzed before being added: a super-action 
addhyp{H) contains, among others, the rule 
To addhyp(H): if H is a conjunction, 

then successively add all the elements of the conjunction 
The conclusion is then replaced by its definition 'iX{X G a X G c) by 
applying the rule 

Rule defconcl: if the conclusion is C 

and there exists a definition of the form C D, 
then the new conclusion is D 

By the preceding rules V and =J>, there is then a new object x, a new hypothesis 
X G a, and the conclusion is now x G c. The following rule 
Rule C: if there are hypotheses A C B and X G A, 
then add the hypothesis X G B 

is a rule that was automatically built by Muscadet from the definition of 
inclusion. Here it is applied twice, adds the hypotheses x G h then x G c, which 
is the conclusion to be proved. The proof ends by applying the rule 
Rule stopl: if the conclusion C is also a hypothesis, 
then set the conclusion to true 

Muscadet is also able to work in second order predicate calculus, and the 
preceding example may be written transitive{c.) with the definition of the 
transitivity transitive(R) 'dA\/B\/C{R{A, B) A R{B, C) R{A, C)) 
After the conclusion transitive(G) has been replaced by its definition, the proof 
is the same as above. 



3 From MuscadetI to Muscadet2 

The first version of Muscadet, which is now called MuscadetI, was described 
and analyzed in [4], [5], [6]. The inference engine of MuscadetI was written 
in PASCAL, and knowledge (rules, metarules and super-actions) was written in 
a language that was considered simple and declarative. MuscadetI produced 
good results; it was evaluated for several years but its use was limited. In par- 
ticular, the language was not adapted to the expression of procedural strategies. 

The current version, called Muscadet2, is completely written in PROLOG. 
The reason for this is that it is possible to use the same language to express 
declarative knowledge such as rules, definitions, hypotheses, etc., more proce- 
dural knowledge such as proof strategies, and the inference engine itself. The 
inference engine contains only few predicates since it is completed by the PRO- 
LOG interpreter. This leads to more flexibility, more facilities for writing, and 
even more efficiency. Moreover it was possible to carry out many improvements 
and to write new strategies, which were not possible in the first version. It was 
also possible to use, without having to implement them, all the facilities of ex- 
pression of PROLOG. 

Muscadet2 was able to work on problems of the TPTP 
Problem Library (Thousands of Problems for Theorem Provers, 
http://www.cs.jcu.edu.au/~tptp). It participated in competitions GASG-I6 
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and CASC-17. It could of course only compete in the “first order” divisions, 
that is FOF (FEQ and NEQ) and SEM, since it does not work with clauses. The 
results (http://www.cs.jcu.edu.au/~tptp/CASC ) show the complementarity 
of Muscadet2 with regard to provers based on the resolution principle. 

4 Machine Representations 

PROLOG is not only used as implementation language of Muscadet2, but 
also as representation language to represent mathematical statements, facts and 
rules. Rules express declarative knowledge. Elementary actions and some strate- 
gies define procedural actions. Super-actions group packs of rules for a given 
goal. The inference engine is composed of the PROLOG interpreter and of some 
clauses which process the application of rules. 

Expression of mathematical statements. The logical connectives 
and, or, not, =>, <=> are defined as infix operators with precedences 
in the order as the connectives are written down in mathematics. They 
are right associative. The quantifiers are used as binary prefix opera- 
tors, that is for_all(X, <...X>) and existsCX, < ...X>). The example 
theorem introduced in section 2 is written, with the infix inc operator, 
for_all(A,for_all(B,for_all(C,A inc B auid B inc C => A inc C) . 
The proof of the theorem T will be requested by the PROLOG call prove (T). 
The definition of inclusion and intersection are, with the infix elt operator for 
the member relation, 

A inc B <=> for_all(X,(X elt A => X elt B)) 
and A inter B = [X, X elt A and X elt B] 

Expression of facts. The fact that the property C is the conclusion of the 
(sub-)theorem to he proved with number N is represented by the unit PROLOG 
clause concl(N,C). (concl was declared dynamic). Some other properties are 
handled in the same manner, such as to be a hypothesis (hyp(N,H)), an object 
(obj(N,0)), a sub-theorem (sousth(N,Nl) or any other property that seems 
useful. 

Expression of rules and super-actions. Here are the machine expressions 
of the rule and partly of the super-action addhyp of section 2. 

rule(N,=>) :- concl(N, A => B) , addh.yp(N, A), newconcl(N,B) 
addhyp (N, H) :- ( H = A and B -> addhyp (N, A), addhyp (N,B) 

; hyp(N, H) -> true 

; H=for_all(_,_) -> create_nam_ru(rulehyp,Name) , 
buildrules (H , _ , N , Name , [] ) 

; assert(hyp(N, H)), default action 

) . 

The parameter N helps to apply a rule to the (sub-)theorem of number N. 
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5 How to Use Muscadet2 
Muscadet 2 is available at the address 

http : //www.math-inf o . univ-paris5 . f r/ pastre/muscadet/muscadet .html 
The PROLOG used is SWI-Prolog, version 3.2.9, which is freeware downloaded 
at the following address 

http : //www. swi .psy.uva.nl/projects/SWI-Prolog/download.html. 

Direct proof. The predicate prove may be directly called with the statement 
of the theorem to be proved as a parameter. The definitions of mathematical 
concepts have to be given before if necessary. For instance, for the first example, 
introduce the definition of the inclusion and ask for building new rules by 
op(200,xfy, inc) . 

assert (definition(A inc B <=> forall(X,(X elt A => X elt B)))). 
buildrules . 

Then call 

prove (f or_all (A , f or_all (B , f or_all (C , 

A inc B and B inc C => A inc C)). 

Muscadet2 then proves the theorem and displays the trace of the proof. It ends 
by writing that the theorem is proved and gives the length of time for the proof. 

Work with files and libraries. You may also work with files containing a list 
of definitions and a list of theorems to be proved or work with the TPTP problem 
library. Muscadet2 accepts the TPTP syntax but translates statements into 
the syntax that was previously described and which is used for the trace. 

As Muscadet must know if a statement is a definition or a lemma, it analyses 
TPTP axioms and hypotheses and classifies them either as definitions or as 
lemmas. 

6 Elimination of Functional Symbols 

Strategies of Muscadet are designed to work with mathematical or logical 
predicates rather than with functional symbols. Nevertheless, Muscadet ac- 
cepts statements written with functions, but it “eliminates” them by giving a 
name to functional expressions which will replace this expression in the pred- 
icative formula. So, p(/(a)) will be replaced by /(a) : b and p{b). The symbol 
is used to express that b is the object /(a), and the formula /(a) : b will be 
handled as if it was a predicative formula pf{a,b). 

For formulas with variables it is a little more complicated. A statement of 
the form p(f{X)) where / is a functional symbol means for the only Y equal to 
f(X) p(Y) is true. It is equivalent to the two following statements VY(/(Y) : 
Y p{Y)) and 3Y{f{X) : Y Ap{Y)). Depending on the context, one or the 
other of these two statements is preferable. The reasons for this are developed 
in [4]. So, p{f{X)) is replaced by onlyCf (X) :Y, p(Y)) which will be handled 
by specific rules. 
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7 Metarules 

Metarules automatically build rules from the definitions and lemmas. These rules 
are more operational than the definitions and lemmas themselves. 

Other metarules build the list of active rules, which is the list of rules that 
are pertinent to the theorem to be proved. Rules will be tried in the listed order. 
If this order is important, it will have to be stated by metarules. 

8 Some Strategies 

The strategies are rather classic ones. They come from natural deduction. Some- 
times it is necessary to avoid carrying out some treatments too early in order to 
avoid possible infinite branches or too much splitting. 

There are no universal hypotheses since the super-action addhyp, instead of 
adding them, considers them as lemmas and creates new rules which are local 
for the current (sub-)theorem. 

Existential hypotheses lead to the creation of new mathematical objects, but 
this is done very carefully in order to avoid generating infinitely many objects 
in only one direction. 

Disjunctive hypotheses lead to splitting but, as for existential hypotheses, 
this is done not too early and one by one, in order to avoid multiplying splitting 
needlessly. 

9 Perspectives 

The strategies of Muscadet2 will continue to be improved and refined. The 
building of rules will be developed in order to perform new actions while avoiding 
infinite chains of created objects. 

For domains that require mathematical heuristics as data, the work already 
done in [4] will be taken up again, but this time expressing all this knowledge by 
first order statements and writing new metarules capable of deducing effective 
rules. 
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Abstract. We present a tool deciding a fragment of set theory. It is 
designed to be easily accessible via the internet and intuitively usable 
by anyone who is working with sets to describe and solve problems. 
The tool supplies features which are well-suited for teaching purposes 
as well. It offers a self explaining user interface, a parser reflecting the 
common operator bindings, parse tree visualization, and the possibility 
to generate Venn diagrams as examples or counterexamples for a given 
formula. The implemented decision procedure which is based on the 
semantics of class theory is particularly suitable for this. 

Keywords: Set Theory, Decision Procedures, First-Order Logic. 



1 Motivation 

The language of set theory is one of the most common formal languages. It is 
used in research fields ranging from mathematics, computer science over natu- 
ral science and engineering up to economy. This success is probably due to the 
fact that set theory can be used informally in most of its applications and that 
the well-known Venn diagrams can illustrate relations and combinations of sets. 
In contrast to this, in logics the precise definition of the syntax and semantics 
of languages of set theory is crucial and it provides methods which allow the 
automatic decision whether formulas of set theory are true, satisfiable, or in- 
consistent. However, the number of actual users of such methods is far smaller 
than the number of users of set theory. This is probably the case because an 
average user of the language of set theory is often not aware of such methods 
and because existing implementations are not easy to find, to install, and often 
difficult to learn even if the user should be familiar with the notation of logic. 
The Hilberticus tool is designed to overcome these problems and it supplies the 
user with a graphical feedback in the form of Venn diagrams which eases its use 
and illustrates examples and counterexamples of a satisfiable formula in a con- 
venient way. An alpha version of the tool is accessible via www . hilberticus . de. 
The name Hilberticus is a synthesis of the names Hilbert and abacus. 

2 Syntax and Semantics of the Language SL 

Syntax of SL. The language SL (Set Language) contains the parentheses sym- 
bols ‘)’ and the logical connectives A, V, =>, and <G>, the predicate symbols 



R. Gore, A. Leitsch, and T. Nipkow (Eds.): IJCAR 2001, LNAI 2083, pp. 690—695, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




Hilberticus - A Tool Deciding an Elementary Sublanguage of Set Theory 691 



=, C, and C, the function symbols fl, U, and \, the constant symbols 0 and D, 
and a countable set of class variables A, B etc. The formulas of SL are defined in 
the usual way. A string t of the above symbols is called a term if t is a constant 
symbol, if t is a variable, or if t is of the form (ti • t 2 ) where ti and t 2 are 
terms and where • denotes a function symbol. A string is called an atomic 
formula if is of the form (ti =t 2 ), (ti Ct 2 ), or (ti Ct 2 ) (where ti and t 2 
are terms). Finally, we call a formula if it is a propositional combination of 
atomic formulas. To write and read SL-formulas more conveniently we allow 
the suppression of parentheses using binding priorities for function symbols and 
logical connectives. The binding of the function symbols is defined according to 
the list n, U, \ which is ordered descendingly by binding priority. Similarly, the 
binding of the logical connectives is defined according to the list A, V, =>, <J4>, 
where and have the same priority and associate to the right, in contrast 
to all other binary symbols. 

Semantics of SL. The semantics of SL is taken from [1,2]. There a constructive 
definition of set theory is developed which is based on the Zermelo-Fraenkel 
axioms. We denote hy a : V ^ T> a,n assignment from the set of variables V 
into the collection of all classes T>. The interpretation / of the predicate and 
function symbols =, C, c and fl, U, \ is standard. The constant symbols 0 and 
D are interpreted as empty and universal class respectively. The domain T> of 
all classes and the interpretation I make up the model of interest which we will 
denote by Al. A formula is satisfiable in M if it is true for some assignment 
cr and <P is true in Ai if it is true for all assignments a. 



3 The Decision Procedure and Its Implementation 

The decision procedure consists of two steps. Firstly, a formula of SL is 
transformed to a formula of the more fundamental language SBL (Set Basis 
Language) using the calculus LG of [1,2]. Secondly, the formula in SBL is de- 
cided via the transformation to a propositional formula. The language SBL is 
a first-order language without function symbols and with ‘g’ as the only (unin- 
terpreted) predicate symbol. The calculus SBL is the usual predicate calculus 
for this signature and we call a formula valid if it is true in all models of SBL. 

The calculus LG. Descriptions of set theory are typically based on a first-order 
language with ‘g’ and ‘=’ as the only predicate symbols. As soon as the formulas 
of this first-order language are becoming large and unreadable abbreviations are 
introduced and this not only for formulas but also for terms denoting sets or 
set-like objects. The abstraction terms ‘{u | <P}’ are used for this purpose. Gen- 
erally, the use of abstraction terms is only informal which can lead to problems 
(as discussed in [1]). In LG such problems are avoided by the introduction of 
abstraction terms as a part of the language together with axioms and inference 
rules for their manipulation. A number of theorems show the equivalence of set 
theories based on LG and set theories based on a usual predicate logic. In the 
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following we focus on some elements of LG and formulate relevant results in 
a way appropriate to the scope of this paper. For a thorough investigation of 
LG (including the notation of frames) we refer to [1,2]. For our purposes the 
following derivations are of interest [2] : 



h 

LG 

h 

LG 



{w|^} € t {w={v\<P} A w € t), 

t € {ii|<^(v)} ^(t), h ti=t2 Vw (ri € € t2) 

LG 



( 1 ) 



By successively applying these formulas it is always possible to eliminate all 
abstraction terms of an LG-formula in favor of an equivalent SBL-formula. 
The fact that the additional axioms and rules for the abstraction terms do only 
serve as abbreviations in our case is reflected by the following theorem. 

Theorem 1. If <P is a formula in LG and <P the formula in SBL derived by 
the application of the formulas of (1) to <I>, then is derivable in LG iff <P is 
derivable in SBL, 



h iff \- <P . 

LG SBL 



( 2 ) 



We can now use the calculus LG to introduce the following abbreviations: 



AcB <^def yx{xeA xeB), AuB =def {y\ yeAv yeB}, 

AcB 4=>def AGB/\^{A=B), AdB =def {y\ yeA /\ yeB}, (3) 

0 =def {x\^{x = a;)}, B =def {a:| x = x}, A\B =def {yl y^A A -^{yeB)} 

To translate a formula in SL to a formula in SBL we replace the predicates 
and functions of SL by the corresponding definitions and subsequently apply 
the formulas of (1) as shown below. The semantics of SL (Sect. 2) is the same as 
the semantics of LG defined in [1]. Due to the soundness and completeness of 
the calculus LG [1] a formula of SL is thus true in the underlying model A4 
if it is derivable in LG. Together with (1) and the soundness and completeness 
of the predicate calculus we obtain the result: 

Theorem 2. If is a formula in SL and the formula derived after the ex- 
pansion of<P by the use of the abbreviations of (3) and the successive elimination 
of abstraction terms by the use of the formulas of (1) then is true in AA iff “I 
is valid in SBL. 

Theorem 2 reduces the decidability problem of SL to the decidability problem 
of the calculus SBL. 

Transformation to a propositional formula. As mentioned earlier the calculus 
SBL is a predicate calculus without function symbols and with ‘g’ as the only 
predicate symbol. Such a calculus is undecidable in general (see for instance [3]). 
The translation procedure described above produces, however, formulas lying in 
a certain subset of SBL which we call SBL^. The variables of a formula of 
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this subset can always be divided into one set of variables occurring only on the 
left-hand side of the ‘G’ predicate (element variables) and one set of variables 
occurring only on the right-hand side of the ‘G’ predicate (class variables). Fur- 
thermore only quantifiers for element variables are introduced in the course of the 
translation from SL to SBL because due to the syntax of SL the first formula 
of (1) is not applied. A formula in SBL^ can be transformed to a propositional 
formula T{p \, . . . , pm) where pi, . . . , pm are SBL^-formulas of the form: 

Vx(a;GAiV . . . Va:GA„), V®(a:GAiV . . . V-'(®gA„)), . . . , V®(-'(a; gAi)V . . . V-'(a;GA„)) 

' ' V ^ ' V " 

PI P2 Pm 

This is possible because due to the restricted quantification a universal quantifier 
Vx can be moved (after elimination of and <t^) to the right until a disjunction 
of the form (x G AiV...Vx G Aj~) is encountered (where k < n, and atomic 
formulas are possibly negated) . This can be achieved by the use of the formulas 
below where ‘l’(x ) ; '^(x) denote formulas containing x as a free variable and 
are formulas not containing x as a free variable. 

h ((Vx <?(^))A(Vx %))), 

SBL (4) 

I- Vx(<?Vtf'(2;)) (<?>V(Vx !?'(3;))), h <P 4^ 

SBL SBL 



In Example 1 below we demonstrate the complete transformation of an SL- 
formula to a propositional formula. We use the abbreviations of (3), the second 
formula of (1), the formulas of (4), and the notation: 

~xeA, W := -(x G A), [P ^, Pf ] := P^V . . . VPf . 



Example 1 



A n BCC ^ ( ACC V BCC) 

{v\v€AAv€B}CC^{ACCV BCC) 

(Vx(xG{x|wGA Ax GP} xGC)) ((Vj/(y gA=> j/GC))V(V2(2:GB ^ z€C))) 
(Vx((xGA AxGB) =>xGC)) {{yy{y€A^y€C))\/{yz{z GP « ^C))) 

(Vx((P,^ A Pf ) ^ Pf )) ^ ((Vy(P/ ^ P,^))V(V«(Pf ^ Pf ))) 

-.(Vx(-(Pi‘ A Pf ) V Pf ))V((V2/(f^ V Pf ))V(V^(j^ V Pf ))) 

-(Vx[pf, Pf, Pf ])v(Vj/[jf^, p^])v{yz[W, pf ]) 

^(Vx[Pf Pf pf ]) V(Vj/([L^ pf pf ] A[jf) pf lf]))V{yz{[PF, pf pf 1 A[j^ pf W])) 
^(Vx[pf Pf pf ])V(Vj/[ff) pf pf ]AVj/[jf) pf Ff])v{yz[iW, pf pf ]AVf pf W]) 



f Vx[Pf Pf pf ])V(Vx[Pf pf pf ] AVx[Pf Pf pf ])V(Vx[Pf Pf pf ]AVx[Pf Pf pf ]) 



The resulting propositional formula is iP = ~<P 7 V{p 5 Apr)\/{p 3 Apr). In general the 
formula can have m = 2" p-arguments if n is the number of different variables 
occurring in the SL-formula. The p-arguments are independent of one another 
except that all of them cannot be true simultaneously. The validity of a formula 
in SBL2 can be decided by a Boolean valuation of the propositional formula 
T excluding the case that all p-arguments are simultaneously true. Note at this 
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point that a formula T in which all possible p-arguments occur represents the 
worst case. In the great majority of cases only a small subset of the theoretically 
possible p-arguments actually occur in T making the evaluation of T much more 
efficient (see Example 1). In the case of a satisfiable SL-formula it is possible to 
construct a collection of finite sets from a Boolean valuation b : {pi, , pm} — >■ 
{false, true}"^ of the corresponding formula These sets can subsequently 
serve to generate Venn diagrams as counterexamples (or examples) of the SL- 
formula. A description of such a generation can be found on the homepage of 
the tool. 



The implementation. The Hilberticus tool decides whether an SL-formula is 
true, satisfiable, or inconsistent in the model described in Section 2. Given 
an SL-formula the tool generates a strictly typed abstract syntax tree (AST) 
using the syntax and the priorities of functions and logical connectives described 
in Section 2. Then the tree is transformed to an AST representing the corre- 
sponding SBL-formula which is subsequently decided according to the described 
procedure. The tool is written in Java© to be easily accessible. It is made up of 
different modules which are tested and bound together within the Electronic Tool 
Integration platform (ETI) [4], the experimental platform of the Int. J. STTT. 
The integrated parser was generated using SableCC, a suitable Java© compiler 
compiler [5]. The ASTs of SL- and SBL-formulas can be visualized using the 
PLGraph class library which is supplied by the ETI platform. 



4 Related and Future Work 

The language SL is a sublanguage of a language first described in [6] and later 
named Multi-level Syllogistic (MLS), see for instance [7]. Later on, various vari- 
ants of (MLS) have been shown to be decidable. The most recent decision pro- 
cedures use semantic tableaux [7]. The semantics of the languages is based on 
Zermelo-Fraenkel set theory or parts of it. The sublanguage of MLS which is 
the most similar to SL is called 2LS and is described together with a decision 
procedure in [8]. As mentioned earlier the Hilberticus tool is the first implemen- 
tation of the decision procedure described in Section 3. It was chosen because it 
supplies a convenient way to obtain finite sets for Venn diagram generation and 
because of the calculus LG offering a natural possibility to extend the language. 
The use of abstraction terms makes it an easy task to introduce new function 
symbols. With a test implementation containing a generalized decision proce- 
dure we were able to find the incorrect formula IJ (MDN) = (jMnlJiV in [9, 
p.545], a book which is used as reference for all kinds of mathematical formulas. 
We are currently working on a version of the tool which uses tableaux based 
procedures such as mentioned above and the described translation to SBL in 
combination with decision (or verification) procedures for predicate and monadic 
logic. In this context the ETI platform supplies the ideal environment for the 
integration, comparison, and testing of these translation and decision procedures. 
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Abstract. The STRIP system is a theorem prover for intuitionistic 
propositional logic with two main characteristics: it deals with the du- 
plication of formulae during proof-search from a hne and explicit man- 
agement of formulae (as resources) based on a structural sharing and it 
builds, for a given formula, either a proof or a countermodel. 



1 Introduction 

In recent years there was a renewed interest in proof-search for constructive log- 
ics like intuitionistic logic (IL), mainly because of research in intuitionistic type 
theories and their relationships with programming through proof-search. Dif- 
ferents methods (based on resolution, connections, translation in classical logic, 
constraints calculus) and implementations have been already designed for IL but 
our aim in this work is to focus on two main problems: to avoid the duplication 
of formulae during proof-search and to efficiently build countermodels in case of 
non-provability. Firstly, we consider the propositional fragment of IL (IPL) but 
our main goal is to define structural solutions general enough to be applicable 
to other substructural or intermediate logics [1]. A good and efficient explicit 
management of formulae (as resources), both in the logical system and in the 
implementation, is important to have reliable and efficient implementation tech- 
niques of logical calculi (and connected proof-search methods), for instance in 
imperative programming languages like C or Java. We have already studied this 
point in [3] for the contraction-free sequent calculus LJT [2] for which there are 
refinements in order to solve the duplication problem [1,2,5]. The STRIP system, 
available at http://www.loria.fr/~larchey/STRIP decides the provability of 
a given IPL sequent and then builds a proof or a countermodel (as a Kripke 
tree). It is based on a new logical system, named SLJ [4] and on a structural 
solution of the duplication problem (without the introduction of new formulae 
and variables like in [2,5]). In order to illustrate and emphasize the interest and 
the results of structural sharing and its implementation we have compared, from 
various IPL formulae, the STRIP system with Porgi^, a similar prover for IPL 
written in SML, and then with the ft^ system that is not based on LJT but is 
written in C like our system. 

^ available at http://www.cis.ksu.edu/~allen/porgi.html 
^ available at http://www.sics.se/isl/ft.html 

R. Gore, A. Leitsch, and T. Nipkow (Eds.): IJCAR 2001, LNAI 2083, pp. 696—700, 2001. 
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r.A. 



B 



■C\- B 



[(- 



C\ 



B 



\C\hG 



B{A^ B)^C'r G B{Ay B)^C'r G 

Fig. 1. Rules and duplication 



[(V)- 



2 Formulae Duplication and Sharing 

In LJT [2], two kinds of formulae duplication appear even though the system 
is contraction- free: these are illustrated in the rules of figure 1. Let us give a 
brief overview of the results and techniques presented in [4] . The duplication on 
the Ihs is treated as in [5], introducing a mark and a so-called boxed sequent: 
r, A^ B* ^ C \- M with the intended meaning oi F,A,B ^ C \~ B. 

The duplication on the rhs is addressed on 



a structural way. Whereas the Ihs part of 


(A V B) ^ 


C 


A^C,B^C 


a sequent is usually considered to be a fiat 


C 




C 


list a formulae, we use a list of trees, i.e. a 


I 






formulae-indexed forest. Thus, the sequents 


AV B 




A B 



are represented by specific trees in which for- 
mulae are paths from roots to leaves and log- Fig. 2. Logical rule 

ical operations are operations on the tree leaves. The problem of duplication is 
then a problem of structural sharing in such trees. Similar ideas can be ap- 
plied to a refutability system in order to generate countermodels in case of non- 
provability [6]. By such an approach, there is no formulae duplication anymore: 
each subformula is used at most once in a proof-search branch. During proof- 
search the structure of the forest changes but not its size. The STRIP system, 
that provides proofs or countermodels for IPL formulae, is based on structural 
sharing techniques with the following results: no dynamic memory allocation, a 
finer control on the resources and a Cl(n log n)-space algorithm for provability 

[4]. 

3 Structures and Strategies 

The formulae-indexed forest data structure has to be implemented in such a way 
that the administrative, logical operations, and those related to strategies take 
the less time possible. The leaves are chained into a list to provide fast access to 
active formulae (which are those indexing leaves). The STRIP system includes 
two different implementations of this structure depending on the way to deal with 
the operations of cutting or pasting subtrees. In the first one, called Irmost, one 
memorizes for each node the indexes of the leftmost and rightmost leaves under 
this node. In the second one, called index-scope, one computes for each node 
its scope that is the greatest index among the indexes that could potentially be 
under the current node. Moreover the system proposes two proof-search methods 
(or strategies for the choice of the leaf to develop at each step). The strategy, 
called first-leaf, chooses the first active leaf from a left-to-right search in the set 
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4 ((A -> (B V (B -> C))) -> C) -)■ C 

6 B^((A^ B^C)^C)^(A^ B^C) 

7 (((A A B) V C) -> (C V (C A D))) -> (^A V ((A V B) -> C)) 

8 ((A^ By A^CV B^C)^(AaB aC))^(AaB AC) 

13 ^^((^A-s-B)-s-(^A-s-^B)-s-A) 

14 (^(A->(BVC)))-s-(BVC)-s-A 

15 -.-.A V (A -> -.-.B V (B -> -.-.C V (C -> (-.-.B -> B) V -.B V ~^^D))) 

20 (((G ^A)^J)^D^E)^ (((B -> B) ->/)-*■ C -> J) 

-s-(A-s-B)-s-F-s-G-> (((C -> B) ->/)-> B) -> (A -)■ C) 
^{{{F^A)^B)^I)^E 

21 -.-.(((A<-> B) <->■ C) <->■ (A<-> (B <->■ C))) 

22 ((^^(^A V ^B) -> (^A V ^B)) -> (^^(^A V ^B) V ^(^A V ^B))) 

-> (^-'(-'A V -iB) V -.(-.A V -iB)) 

24 Pigeonhole 2-3 

25 Pigeonhole 3-4 
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Fig. 3. Comparison : STRIP vs Porgi 



of leaves. The strategy, called rule-prec, also considers such a left-to-right search 
but the set of leaves is split in different groups having different priorities and 
then it selects the first active leaf in the group with the highest priority. With the 
rule-prec strategy one always builds a countermodel in case of non-provability 
because the invertible rules are applied before the non-invertible rules. 

4 Results and Comparisons 

For a given sequent, STRIP can decide its provability and then build a proof 
or a countermodel (as a Kripke tree). The user can select the forest implemen- 
tation (irmost or index-scope) and the strategy (first-leaf or rule-prec). 
The system provides various statistics about the search like the number of rules 
applications (in order to evaluate the efficiency of strategies) or the number 
of performed operations (in order to determine if the forest implementation in- 
duces a huge overhead) . We have compared the STRIP system with two provers, 
namely Porgi [8] and ft [7]. 

Porgi is a proof-or-refutation generator, written in SML, that is based on LJT 
with a strategy closed to the rule-prec strategy. It has a very simple lexical 
analyzer and thus we have only been able to test it with small formulae. 

Some comparisons between Porgi and STRIP are given in figure 3 with formulae 
that are provable (p) or unprovable (u) . The left part of the table presents exe- 
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Fig. 4. Comparison : STRIP vs ft 



cution times for both systems (T“ in seconds for 10^ loops CPU time only). The 
exponent ‘Ir’ (resp. ‘is’) corresponds to the Irmost (resp. index-scope) imple- 
mentation of the forest. The suffix ‘fl’ (resp ‘rp’) corresponds to the first-leaf 
(resp. rule-prec) strategy. The right part presents measures of the space size 
and search costs for STRIP proof-search. The expressions SSy and SC“ respec- 
tively represent the number of logical rules applications and the total number of 
operations during the proof-search. 

The STRIP system is always more efficient with the rule-prec strategy, which 
is almost the same as the one implemented in Porgi. Moreover, the difference is 
significant for the pigeon-hole formulae. We can also compare the strategies. 
In examples 20 and 22, we see that first-leaf is much less efficient than 
rule-prec because of the size of the proof-search space and thus in this case 
STRIP is slower than Porgi. The pigeon-hole examples illustrate the actual im- 
pact of sharing techniques and that some tasks, like forest management, are 
implemented much more efficiently in our system. From the SC statistics, i.e. 
measures of the search cost, we observe that the index-scope implementation 
of the forest is always better than the Irmost one but that the difference does 
not grow over factor 2, whichever strategy is chosen. However it is clear that 
efficient management of formulae becomes crucial on larger scales. 

The ft system is suited for first-order logic but has a propositional subsystem for 
IPL. This system is not based on LJT and does not build countermodels but it 
is written in C like STRIP. Some comparisons between STRIP and ft are given 
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in figure 4. In order to analyze the impact of sharing techniques on proof-search, 
we have considered instances on finite domains of provable first-order formulae 
and also pigeonhole examples. The Dom. column represents the size of the do- 
main or the number of pigeons and the Size column represents the size of the 
generated formulae^. We observe that, in general, STRIP is much faster than ft 
for the first kind of examples. But for the pigeonhole examples both systems are 
on par with a slight but decreasing advantage to ft. In fact they include very few 
implications (— >■) and the structural sharing is such that left implication rules 
may cut down the problems by large amounts. In this case, the choice of a light 
strategy is important. We observe that with the rule-prec strategy, STRIP 
spends 85 % of its computation time looking for the active formulae. With some 
cyclic variants of first-leaf, we can cut-down this time to 65% which is not 
optimum but nevertheless better, thus being close to ft and even better on the 
larger case (7-8). Anyway, we see that the greater the pigeon problem is, the 
better STRIP behaves, compared to ft. 

Further work will be devoted to other tests and comparisons but regarding our 
positive results, our main goal is to apply or extend these implementation tech- 
niques (forest representation, structural sharing, forest implementation) to other 
substructural or intermediate logics [1] and thus to provide efficient provers that 
build proofs or countermodels for such logics. 
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Abstract. RACER implements a TBox and ABox reasoner for the logic 
SHTQ. RACER was the first full-fledged ABox description logic system 
for a very expressive logic and is based on optimized sound and complete 
algorithms. RACER also implements a decision procedure for modal logic 
satisfiability problems (possibly with global axioms). 



1 Introduction 

The description logic (DL) S'HTQ [18] extends the logic ACCN'Hr+ [9] by addi- 
tionally providing qualified number restrictions and inverse roles. ACCM'Hn+ 
was the logic supported by RACE (Reasoner for ABoxes and Concept Ex- 
pressions), the precursor of RACER (Renamed ABox and Concept Expres- 
sion Reasoner). Using the ACCN'Hb.+ naming scheme, S'HTQ could be called 
A£CQHTji+ (pronunciation: ALC-choir). 

ACCQHTn+ is briefly introduced as follows. We assume a set of concept names 
C, a set of role names R, and a set of individual names O. The mutually disjoint 
subsets P and T oi R denote non-transitive and transitive roles, respectively 
{R = P T). A£CQHTn+ is introduced in Figure 1 using a standard Tarski- 
style semantics. The term T (T) is used as an abbreviation for C U -iC (C □ -iC). 

If R, S G R are role names, then R C S is called a role inclusion axiom. A role 
hierarchy TZ is & finite set of role inclusion axioms. Then, we define U* as the 
reflexive transitive closure of U over such a role hierarchy 'R.. Given C*, the set 
of roles R'^ = {S G R I S U* R} defines the sub-roles of a role R. We also define 
the set 5” := {R G P I R'^ n T = 0} of simple roles that are neither transitive nor 
have a transitive role as sub-role. 

The concept language of A£C QHTn+ syntactically restricts the combination of 
number restrictions and transitive roles. Number restrictions are only allowed 
for simple roles. This restriction is motivated by a known undecidability result 
in case of an unrestricted syntax [17]. In concepts, instead of a role name R (or 
S), the inverse role R~^ (or S^^) may be used. 

If C and D are concepts, then C C D is a terminological axiom {generalized con- 
cept inclusion or GCI). A finite set of terminological axioms Tn is called a 
terminology or TBox w.r.t. to a given role hierarchy TZ.^ An ABox A is a finite 
set of assertional axioms as defined in Figure 1. 

^ The reference to TZ is omitted in the following if we nse T. 
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Syntax 


Semantics | 


Concepts | 


A 


A^ C 






Cn D 


c^nD^ 


Cud 


C^UD^ 


3R.C 


{a G 1 3 b G : (a, b) G R^, b G C^} 


VR.C 


{a G A^ V b G A^ : (a, b) G R^ ^ b G C^} 


3>„S.C 


{aG A^l ||{y|(x,y)GS^yGC^}|| > n} 


3<„S.C 


{aG A^l ||{y|(x,y)GS^yGC^}|| < n} 


Roles 1 


R 


R^CA^xA^ 1 



A is a concept name and || • || denotes the cardinality of 
a set. Furthermore, we assume that R G R and S G S’. 



1 Axioms 1 


Syntax 


Satisfied if 


R G T 

R U S 
CUD 


R^ = (R^)+ 
R^ C 
C^CD^ 



Assertions 


Syntax 


Satisfied if 


a;C 

(a,b):R 


a^ G C^ 

(a^, b^) G R^ 



Fig. 1. Syntax and Semantics of ACCQ'HIji+ . 



An interpretation I is a model of a concept C (or satisfies a concept C) iff 
yf 0 and for all R G A it holds that iff {x,y) G then {y,x) G (R^^)^ An 
interpretation is a model of a TBox T iff it satisfies all axioms in T. See Figure 

1 for the satisfiability conditions. An interpretation is a model of an ABox A 
w.r.t. a TBox iff it is a model of T and satisfies all assertions in A. Different 
individuals are mapped to different domain objects (unique name assumption). 

2 Inference Services 

In the following we define several inference services offered by RACER. 

A concept is called consistent (w.r.t. a TBox T) iff there exists a model of C 
(that is also a model of T and TV). An ABox A is consistent (w.r.t. a TBox T) 
iff A has model X (which is also a model of T). A knowledge base (T,A) is called 
consistent iff there exists a model for A which is also a model for T. A concept, 
ABox, or knowledge base that is not consistent is called inconsistent. 

A concept D subsumes a concept C (w.r.t. a TBox T) iff C for all in- 
terpretations I (that are models of T). If D subsumes C, then C is said to be 
subsumed by D. 

Besides these basic problems, some additional inference services are provided by 
description logic systems. A basic reasoning service is to compute the subsump- 
tion relationship between concept names (i.e. elements from C). This inference 
is needed to build a hierarchy of concept names w.r.t. specificity. The problem 
of computing the most-specific concept names mentioned in T that subsume a 
certain concept is known as computing the parents of a concept. The children 
are the most-general concept names mentioned in T that are subsumed by a cer- 
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tain concept. We use the name concept ancestors (concept descendants) for the 
transitive closure of the parents (children) relation. The computation of the par- 
ents and children of every concept name is also called classification of the TBox. 
Another important inference service for practical knowledge representation is 
to check whether a certain concept name is inconsistent. Usually, inconsistent 
concept names are the consequence of modeling errors. Checking the consistency 
of all concept names mentioned in a TBox without computing the parents and 
children is called a TBox coherence check. 

If the description logic supports full negation, consistency and subsumption can 
be mutually reduced to each other since D subsumes C (w.r.t. a TBox 7~0 iff 
C n is inconsistent (w.r.t. T) and C is inconsistent (w.r.t. T) iff C is subsumed 
by T (w.r.t. T). Consistency of concepts can be reduced to ABox consistency 
as follows: A concept C is consistent (w.r.t. a TBox 'T) iff the ABox {a:C} is 
consistent (w.r.t. T). An individual i is an instance of a concept C (w.r.t. a 
TBox T and an ABox A) iff G for all models 2 (of T and A). Again, for 
description logics that support full negation for concepts, the instance problem 
can be reduced to the problem of deciding if the ABox AU {a: ^C} is inconsistent 
(w.r.t. T). This test is also called instance checking. The most-specific concept 
names mentioned in a TBox T that an individual is an instance of are called 
the direct types of the individual w.r.t. a knowledge base (T,A). The direct 
types inference problems can be reduced to subsequent instance problems. The 
retrieval inference problem is to find all individuals mentioned in an ABox that 
are an instance of a certain concept C. The set of fillers of a role R for an 
individual i w.r.t. a knowledge base (T,A) is defined as {x | (T,A) ^ (i,x):R} 
where (T,A) ^ ax means that all models of T and A are also models of ax. 
The set of roles between two individuals i and j w.r.t. a knowledge base ('T,A) 
is defined as { R I (T, A) 1= (i,j):R}. 

As in other systems, there are some auxiliary queries supported: retrieval of the 
concept names or individuals mentioned in a knowledge base, retrieval of the set 
of roles, retrieval of the role parents and children (defined analogously to the 
concept parents and children, see above), retrieval of the set of individuals in 
the domain and in the range of a role, etc. As a distinguishing feature to other 
systems, which is important for many applications, we would like to emphasize 
that RACER supports multiple TBoxes and ABoxes. Assertions can be added 
to ABoxes after queries have been answered. In addition, RACER also provides 
support for retraction of assertions in particular ABoxes. The inference services 
supported by RACER for TBoxes and ABoxes are described in detail in [11]. 

3 The RACER Architecture 

The ABox consistency algorithm implemented in the RACER system is based 
on the tableaux calculus of its precursor RACE [9]. For dealing with qualified 
number restrictions and inverse roles, the techniques introduced in the tableaux 
calculus for SH2Q [18] are employed. 

However, optimized search techniques are required in order to guarantee good 
average-case performance. The RACER architecture incorporates the following 
standard optimization techniques: dependency-directed backtracking [22] and 
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DPLL-style semantic branching (see [6] for an overview of the literature) . Among 
a set of new optimization techniques, the integration of these techniques into DL 
reasoners for concept consistency has been described in [15]. The implementation 
of these techniques in the ABox reasoner RACER differs from the implementa- 
tion of other DL systems, which provide only concept consistency (and TBox) 
reasoning. The latter systems have to consider only so-called “labels” (sets of 
concepts) whereas an ABox prover such as RACER has to explicitly deal with 
individuals (nominals) . ABox optimizations are also explained in [8] . 

The techniques for TBox reasoning described in [3] (marking and propagation as 
well as lazy unfolding) are also supported by RACER. As indicated in [7], the 
architecture of RACER is inspired by recent results on optimization techniques 
for TBox reasoning [16], namely transformations of axioms (GCIs) [19], model 
caching [8] and model merging [15] (including so-called deep model merging and 
model merging for ABoxes [13]). RACER also provides additional support for 
very large TBoxes (see [10]). 

RACER is implemented in Common Lisp and is available for research pur- 
poses as a server program which can be installed under Linux and Windows 
(http://kogs-www.informatik.uni-hamburg.de/~race). Specific licenses are 
not required. Client programs can connect to the RACER DL server via a very 
fast TCP/IP interface based on sockets. Client-side interfaces for Java and Com- 
mon Lisp are available. A C/C-l— I- interface is available soon. 

4 Applications 

An application of RACER for ontology engineering is described in [10]. The the- 
ory behind another application of RACER in the domain of telecommunication 
systems is explained in [2]. RACER has also be used for solving modal logic 
satisfiability problems [8] and for database integration tasks. The Java interface 
has been developed in order to support a TBox learning application (see [1]). 

5 Outlook 

The integration of techniques for representing “concrete domains” (e.g. linear 
inequalities between real numbers) on the role fillers of an individual has been 
investigated in [14]. In addition, optimization techniques for dealing with quali- 
fied number restrictions [12] will be integrated into RACER in the next release. 
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